curl -X 'POST' \
'https://prod.visionapi.unsiloed.ai/parse' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@document.pdf;type=application/pdf' \
-F 'use_high_resolution=true' \
-F 'layout_analysis=smart_layout_detection' \
-F 'ocr_strategy=auto_detection' \
-F 'ocr_engine=UnsiloedHawk' \
-F 'extract_strikethrough=false' \
-F 'merge_tables=true' \
-F 'segment_filter=all' \
-F 'validate_segments=["Table","Picture","Formula"]' \
-F 'export_format=["docx"]' \
-F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'
# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \
{
"job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
"status": "Starting",
"file_name": "document.pdf",
"created_at": "2025-07-18T10:42:10.545832520Z",
"message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
"credit_used": 5,
"quota_remaining": 23695,
"merge_tables": false
}
Parse and segment PDFs, images, and Office files into meaningful sections using advanced AI models with flexible customization options.
curl -X 'POST' \
'https://prod.visionapi.unsiloed.ai/parse' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@document.pdf;type=application/pdf' \
-F 'use_high_resolution=true' \
-F 'layout_analysis=smart_layout_detection' \
-F 'ocr_strategy=auto_detection' \
-F 'ocr_engine=UnsiloedHawk' \
-F 'extract_strikethrough=false' \
-F 'merge_tables=true' \
-F 'segment_filter=all' \
-F 'validate_segments=["Table","Picture","Formula"]' \
-F 'export_format=["docx"]' \
-F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'
# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \
{
"job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
"status": "Starting",
"file_name": "document.pdf",
"created_at": "2025-07-18T10:42:10.545832520Z",
"message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
"credit_used": 5,
"quota_remaining": 23695,
"merge_tables": false
}
Documentation Index
Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt
Use this file to discover all available pages before exploring further.
/parse with your file and configuration — the API uploads the document and creates a parse job.GET /parse/{job_id} to track progress and retrieve results.POST /v2/parse/upload) decouples document delivery from job creation for faster uploads, larger file sizes, and higher throughput.file or url parameter. Both cannot be provided simultaneously.url is not provided.file is not provided.true."smart_layout_detection" (default): Intelligently identifies document structure, headers, sections, and content relationships across the entire document using bounding boxes."page_by_page": Analyzes each page independently as a single segment. Faster for simple documents."advanced_layout_detection": Uses a vision-language model for exhaustive page segmentation. Detects 14 element types (Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, KeyValuePair, Signature, Seal). Best for visually complex or unusual layouts."auto_detection" (default): Intelligently detects bad quality PDFs, scanned documents, and images, then applies OCR only where needed."force_ocr": Runs OCR on the entire document regardless of quality."UnsiloedBeta" (default): Handles rotated/warped text and irregular bounding boxes."UnsiloedHawk": Higher accuracy for complex layouts and mixed content."UnsiloedStorm": Enterprise-grade accuracy optimized for 50+ languages."standard": Good balance of speed and accuracy."advanced": Higher quality, best for complex layouts, rotated text, and mixed-language content.false.false.["Table", "Formula", "Picture"]. Defaults to [].validate_segments: ["Table"] instead. Defaults to false."all" to include everything. Defaults to "all".Available segment types:table: Tabular data segmentspicture: Image and graphic segmentsformula: Mathematical equationstext: Regular text contentsectionheader: Section headerstitle: Document titleslistitem: List itemscaption: Image captionsfootnote: Footnotespageheader: Page headerspagefooter: Page footers"table", "table,picture", "table,formula", "picture,formula".false.false to exclude them and reduce response size. All fields default to true.Available fields:html: HTML representation of segmentsmarkdown: Markdown representation of segmentsocr: Raw OCR text data with bounding boxes and confidence scoresimage: Cropped segment images (base64 encoded)content: Text content of segmentsbbox: Bounding box coordinatesconfidence: Confidence scores for segmentsembed: Vector embeddings / embed text{"html": true, "markdown": true, "ocr": false, "image": false}.{
"Table": {"html": "VLM", "markdown": "VLM", "model_id": "us_table_v2"},
"Picture": {"html": "VLM", "markdown": "VLM", "model_id": "nova"},
"Formula": {"html": "Auto", "markdown": "VLM", "model_id": "nova"}
}
html: "VLM" or "Auto"markdown: "VLM" or "Auto"model_id (Table): "astra", "us_table_v1", "us_table_v2"model_id (Picture/Formula): "nova", "luna", "sol"use_table_ocr (Table only): Advanced OCR optimized for tabular data. Better handles bordered cells, gridlines, and complex table layouts.vlm: Custom prompt for the VLM model. Use this to give the model specific instructions for extracting or describing these segment types.segment_analysis. If both are provided, segment_processing takes precedence."1-5", "2,4,6", "[1,3,5]". Defaults to all pages."Unsiloed" (default) — e.g., PageHeader, ListItem, Picture. "Other" — alternative names e.g., Header, List Item, Figure.false.false.false.false.exports field of the response. Currently supported: ["docx"]."Continue" (default) — skips failed pages and continues processing the rest. "Fail" — aborts the entire job on the first error.High-Accuracy Processing
{
"use_high_resolution": true,
"layout_analysis": "smart_layout_detection",
"ocr_strategy": "force_ocr",
"merge_tables": true,
"validate_segments": ["Table", "Picture", "Formula"],
"segment_analysis": {
"Table": {
"html": "VLM",
"markdown": "VLM",
"extended_context": true,
"crop_image": "All",
"model_id": "us_table_v2"
}
}
}
Fast Processing
{
"use_high_resolution": false,
"layout_analysis": "page_by_page",
"ocr_strategy": "auto_detection",
"merge_tables": false
}
Financial Documents (Tables + Charts)
{
"merge_tables": true,
"segment_filter": "table,picture",
"validate_segments": ["Table", "Picture"],
"layout_analysis": "smart_layout_detection",
"ocr_strategy": "auto_detection",
"segment_analysis": {
"Table": {
"html": "VLM",
"markdown": "VLM",
"model_id": "us_table_v2"
}
}
}
Data Extraction Only (Tables)
{
"merge_tables": true,
"segment_filter": "table",
"validate_segments": ["Table"],
"output_fields": {
"html": true,
"markdown": true,
"ocr": false,
"image": false,
"content": true,
"bbox": false,
"confidence": false
}
}
Academic/Research Documents
{
"use_high_resolution": true,
"layout_analysis": "smart_layout_detection",
"ocr_strategy": "auto_detection",
"xml_citation": true
}
Scanned Documents
{
"use_high_resolution": true,
"ocr_strategy": "force_ocr",
"layout_analysis": "smart_layout_detection"
}
Output Fields Optimization
{
"output_fields": {
"html": false,
"markdown": false,
"ocr": false,
"image": false,
"content": true,
"bbox": false,
"confidence": false,
"embed": true
}
}
{
"output_fields": {
"html": false,
"markdown": true,
"ocr": false,
"image": false,
"content": true,
"bbox": true,
"confidence": false,
"embed": true
}
}
output_fields or set all fields to True to include all available data.Benefits:file parameter): Upload the document file directly as multipart/form-dataurl parameter): Provide a publicly accessible URL or presigned URL to the documentfile or url, but not bothurl, the document will be downloaded from the provided URL before processinglayout_analysis parameter controls how the document is analyzed and segmented:
"smart_layout_detection" (default): Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking for complex documents.
"page_by_page": Treats each page as a single segment. Faster processing, ideal for simple documents without complex layouts.
"advanced_layout_detection": Uses a vision-language model to exhaustively segment each page into 14 element types (including KeyValuePair, Signature, and Seal in addition to the standard set). Recommended for documents with dense, non-standard, or visually complex layouts where VGT-based detection misses regions.
agentic_ocr parameter enables per-segment OCR enhancement after layout detection, yielding higher accuracy on small text, stylized fonts, and mathematical formulas.
Values:
"standard": Fast, good for most documents."advanced": Higher quality, better for complex layouts, rotated or irregular text, and multilingual content.ocr_strategy parameter controls optical character recognition processing:
"auto_detection" (default): Intelligently determines when OCR is needed based on the document content. Balances accuracy and performance.
"force_ocr": Applies OCR to all content regardless of existing text layer. Use this for scanned documents or when maximum text extraction is required.
merge_tables parameter enables merging of tables that span across multiple pages:
How It Works:
{
"merge_tables": true
}
xml_citation parameter enables automatic extraction and linking of citations from research papers, academic articles, and scientific documents.
How It Works:
{
"xml_citation": true
}
metadata field with structured citation data:
{
"metadata": {
"citations": [
{
"id": 1,
"title": "Deep Learning for NLP",
"authors": ["John Smith", "Jane Doe"],
"year": "2021",
"journal": "Nature",
"volume": "15",
"pages": "123-145",
"doi": "10.1000/example"
}
],
"document_metadata": {
"title": "Document Title",
"authors": ["Author Name"]
}
}
}
"As shown by Chen et al. (2021)...""As shown by [Chen et al. (2021)](#ref-5)..."segment_filter parameter allows you to filter the output to include only specific segment types, reducing response size and focusing on relevant content:
How It Works:
"all" (default): Include all segment types"table": Only table segments"picture": Only image/graphic segments"table,picture": Tables and pictures only"table,formula": Tables and formulas onlytable, picture, formula, text, sectionheader, title, listitem, caption, footnote, pageheader, pagefooter{
"segment_filter": "table,picture"
}
output_fields parameter allows you to control which fields are included in the API response. This is useful for reducing response size, improving performance, and optimizing bandwidth usage when you don’t need all available data.
Available Fields:
html (default: true): Include HTML representation of segmentsmarkdown (default: true): Include Markdown representation of segmentsocr (default: true): Include OCR results with bounding boxes and confidence scoresimage (default: true): Include cropped segment images (base64 encoded)content (default: true): Include text content of segmentsbbox (default: true): Include bounding box coordinatesconfidence (default: true): Include confidence scores for segmentsembed (default: true): Include embed text in chunk responsesfalse to exclude them from the response. Fields not specified default to true for backward compatibility.
Example Configuration:
{
"html": false,
"markdown": true,
"ocr": false,
"image": false,
"content": true,
"bbox": true,
"confidence": false,
"embed": true
}
image and html can significantly reduce payload sizefalse when you only need basic contentimage and ocr when processing text contentcontent and embed when generating embeddingssegment_analysis parameter allows you to customize how different segment types are processed, including HTML/Markdown generation strategies and which field should populate the content field.
Available Segment Types:
You can configure processing for any of the following segment types:
Table: Tabular data segmentsPicture: Image and graphic segmentsFormula: Mathematical equationsTitle: Document titlesSectionHeader: Section headersText: Regular text contentListItem: List itemsCaption: Image captionsFootnote: FootnotesPageHeader: Page headersPageFooter: Page footersPage: Full page segmentshtml: Generation strategy for HTML representation
"Auto" (default): Automatically determine the best method"VLM": Use VLM to generate HTMLmarkdown: Generation strategy for Markdown representation
"Auto" (default): Automatically determine the best method"VLM": Use VLM to generate Markdowncontent_source: Defines which field should populate the content field in the response
"OCR" (default): Use OCR text for content"HTML": Use HTML representation as content"Markdown": Use Markdown representation as contentmodel_id (Table segments only): Specifies which AI model to use for table processing
"us_table_v1": Standard table processing model"us_table_v2": Enhanced table processing model with improved accuracyvlm: Custom prompt for the VLM model. Use this to give the model specific instructions for extracting or describing these segment types.{
"Table": {
"html": "VLM",
"markdown": "VLM",
"content_source": "HTML",
"model_id": "us_table_v2",
"vlm": "Preserve all merged cells. Use empty strings for missing values."
},
"Picture": {
"html": "VLM",
"markdown": "VLM",
"content_source": "Markdown",
"vlm": "Focus on chart axes, legend labels, and key data trends."
}
}
content_source Works:
The content_source parameter determines which field’s value will be used to populate the content field in the segment response:
content_source is set to "HTML", the content field will contain the HTML representation, and the separate html and markdown fields will be emptycontent_source is set to "Markdown", the content field will contain the Markdown representation, and the separate html and markdown fields will be emptycontent_source is set to "OCR" (default), the content field contains OCR text, and html and markdown fields are populated separatelycontent_source: "HTML" for Table segments when you want HTML-formatted table data directly in the content fieldcontent_source: "Markdown" for Picture segments when you want Markdown-formatted descriptions in the content field"VLM" for both html and markdown generation strategies to get AI-enhanced representations in those fieldsGET /parse/{job_id} to poll for results."Starting" on creation."unknown" when a URL was provided.merge_tables value).curl -X 'POST' \
'https://prod.visionapi.unsiloed.ai/parse' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@document.pdf;type=application/pdf' \
-F 'use_high_resolution=true' \
-F 'layout_analysis=smart_layout_detection' \
-F 'ocr_strategy=auto_detection' \
-F 'ocr_engine=UnsiloedHawk' \
-F 'extract_strikethrough=false' \
-F 'merge_tables=true' \
-F 'segment_filter=all' \
-F 'validate_segments=["Table","Picture","Formula"]' \
-F 'export_format=["docx"]' \
-F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'
# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \
{
"job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
"status": "Starting",
"file_name": "document.pdf",
"created_at": "2025-07-18T10:42:10.545832520Z",
"message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
"credit_used": 5,
"quota_remaining": 23695,
"merge_tables": false
}
curl -X 'GET' \
'https://prod.visionapi.unsiloed.ai/parse/{job_id}' \
-H 'accept: application/json' \
-H 'api-key: your-api-key'
import requests
import time
def get_parse_results(job_id, api_key):
"""Monitor job and retrieve results when complete"""
headers = {"api-key": api_key}
status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"
# Poll for completion
while True:
response = requests.get(status_url, headers=headers)
if response.status_code == 200:
status_data = response.json()
print(f"Job Status: {status_data['status']}")
if status_data['status'] == 'Succeeded':
return status_data # Results are included in the same response
elif status_data['status'] == 'Failed':
raise Exception(f"Job failed: {status_data.get('message', 'Unknown error')}")
time.sleep(5) # Check every 5 seconds
# Usage
job_id = "e77a5c42-4dc1-44d0-a30e-ed191e8a8908"
results = get_parse_results(job_id, "your-api-key")
{
"job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
"status": "Succeeded",
"created_at": "2025-10-22T06:51:16.870302Z",
"started_at": "2025-10-22T06:51:16.966136Z",
"finished_at": "2025-10-22T06:57:19.821541Z",
"total_chunks": 25,
"chunks": [
{
"segments": [
{
"segment_type": "Title",
"content": "Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
"image": null,
"page_number": 1,
"segment_id": "cc5f8dff-31be-4ccf-885d-4f9062fcee17",
"confidence": 0.90187776,
"page_width": 1191.0,
"page_height": 1684.0,
"html": "<h1>Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)</h1>",
"markdown": "# Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
"bbox": {
"left": 72.92226,
"top": 62.030334,
"width": 230.36308,
"height": 55.395317
},
"ocr": [
{
"bbox": {
"left": 63.753525,
"top": 5.395447,
"width": 164.45312,
"height": 42.757812
},
"text": "Disinvestment",
"confidence": 0.9999992
}
]
},
{
"segment_type": "Text",
"content": "Background and context information about the disinvestment process...",
"image": null,
"page_number": 1,
"segment_id": "9d60e48b-77ba-4a23-a0ac-95ee13c615ec",
"confidence": 0.88558982,
"page_width": 1191.0,
"page_height": 1684.0,
"html": "<p>Background and context information about the disinvestment process...</p>",
"markdown": "Background and context information about the disinvestment process...",
"bbox": {
"left": 486.9685,
"top": 139.61847,
"width": 241.29932,
"height": 48.451706
},
"ocr": [
{
"bbox": {
"left": 50.9729,
"top": 3.4557495,
"width": 46.046875,
"height": 19.734375
},
"text": "Background",
"confidence": 0.99999654
}
]
}
]
}
]
}
model_id in the segment_analysis parameter:
us_table_v1: Standard table processing modelus_table_v2: Enhanced table processing model with improved accuracyfile nor url parameter providedfile and url simultaneously402): Not enough page credits remaining.429): Billing usage cap reached. Returns plain text: Usage limit exceeded. No Retry-After header.429): Org exceeded 60 requests / 60s sliding window. Returns plain text: Rate limit exceeded. A Retry-After header may be present depending on the infrastructure layer (Envoy/Istio), but is not set by the application.500): An unexpected error occurred during processing.503): Job queue is at capacity. Retry after the duration indicated in the Retry-After header.403): Access has been revoked.API key for authentication. Use 'Bearer <your_api_key>'
Multipart form data. Provide either file (binary upload) or url (presigned/public URL), not both.
Request body for POST /parse (multipart/form-data).
Provide either file (binary upload) or url (presigned/public URL) — not both.
Document file to process. Required if url is not provided.
Supported formats: PDF, PNG, JPEG, TIFF, PPT, PPTX, DOC, DOCX, XLS, XLSX.
Enable per-segment agentic OCR for higher accuracy. Pass "standard" or "advanced".
JSON object for chunk processing configuration.
Detect checkboxes in document images. Defaults to false.
Error handling strategy for non-critical processing errors.
Continue (default) — proceed despite errors (e.g., LLM refusals).
Fail — stop and fail the task on any error.
Seconds until the task and its output are deleted. Defaults to the plan expiration time.
Export format(s) to generate after processing.
When set, the pipeline generates the requested export files after parsing completes.
The exported files are available as presigned URLs in the exports field of the response.
Currently supported: ["docx"].
File format for exporting parsed results. When specified in a parse request,
the pipeline generates the requested export file after processing completes.
The exported file is available via the exports field in the task response.
docx ["docx"]Extract structured data from charts and graphs. Defaults to false.
Transfer text color from the PDF text layer to OCR results. Defaults to false.
Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false.
Preserve strikethrough formatting in HTML/Markdown output. Defaults to false.
Layout analysis strategy.
smart_layout_detection (default) — detects layout elements using bounding boxes.
page_by_page — treats each page as a single segment; faster for simple documents.
JSON object for LLM processing configuration.
Merge tables that span multiple pages into a single unified structure. Defaults to false.
OCR engine to use for text recognition.
UnsiloedBeta (default) — handles irregular bounding boxes, rotated/warped text.
UnsiloedHawk — higher accuracy, better for complex layouts.
UnsiloedStorm — enterprise-grade accuracy, optimized for 50+ languages.
OCR strategy.
auto_detection (default) — applies OCR only where needed.
force_ocr — applies OCR to all content regardless of existing text layer.
JSON object controlling which output fields are included in the response.
Example: {"html": false, "markdown": true, "ocr": false}.
All fields default to true.
Page range to process. Formats: "1-5", "2,4,6", "[1,3,5]". Defaults to all pages.
JSON object controlling HTML/Markdown generation strategy and AI model per segment type.
Example: {"Table": {"html": "LLM", "markdown": "LLM", "model_id": "us_table_v2"}}.
Content filter: comma-separated segment types to keep.
Example: "table,picture". Use "all" to include everything. Defaults to "all".
Alias for segment_analysis (Core Parser name). If both are provided, this takes precedence.
Segment type naming convention.
Unsiloed (default) — e.g., PageHeader, ListItem, Picture.
Other — alternative names e.g., Header, List Item, Figure.
Presigned or public URL of the document to fetch and process.
Required if file is not provided.
Use high-resolution images for cropping and post-processing.
Latency penalty: ~2–3 s per page. Defaults to true.
JSON array string of segment types to validate with VLM.
Example: ["Table", "Formula", "Picture"]. Defaults to [].
Legacy: validate table segment classifications using VLM.
Prefer validate_segments: ["Table"] instead. Defaults to false.
Extract and hyperlink bibliography citations in the markdown output. PDFs only.
Defaults to false.
Job created — poll with GET /parse/{job_id} to retrieve results.
Response body for a successful POST /parse call.
ISO 8601 timestamp when the job was created.
Number of pages deducted from your quota for this job.
Name of the uploaded file or "unknown" when a URL was provided.
Job identifier — pass this to GET /parse/{job_id} to poll for results.
Whether table merging is enabled for this job (reflects the submitted merge_tables value).
Human-readable status message with a polling hint.
Remaining page quota after this job was deducted.
Initial job status. Always "Starting" on creation.