Skip to main content
POST
/
parse
curl -X 'POST' \
  'https://prod.visionapi.unsiloed.ai/parse' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@document.pdf;type=application/pdf' \
  -F 'use_high_resolution=true' \
  -F 'segmentation_method=smart_layout_detection' \
  -F 'ocr_mode=auto_ocr' \
  -F 'ocr_engine=UnsiloedHawk' \
  -F 'merge_tables=true' \
  -F 'keep_segment_types=all' \
  -F 'segment_analysis={"Table":{"html":"LLM","markdown":"LLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'
{
  "job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
  "status": "Starting",
  "file_name": "document.pdf",
  "created_at": "2025-07-18T10:42:10.545832520Z",
  "message": "Job created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "quota_remaining": 23700,
  "merge_tables": false
}

Overview

The Parse Document endpoint processes PDFs, images (PNG, JPEG, TIFF), and office files (PPT, DOCX, XLSX) documents and breaks them into meaningful sections with detailed analysis including text extraction, image recognition, table parsing, and OCR data. You can provide documents either by direct file upload or by presigned URL. This endpoint supports advanced customization options for fine-tuning the parsing behavior to match your specific use cases.
This endpoint returns a job ID for asynchronous processing. Use the GET PARSE JOB STATUS endpoint to check status and retrieve results when processing is complete.

Request

You must provide either file or url parameter. Both cannot be provided simultaneously.
file
file
Document file to process. Supported formats: PDF, images (PNG, JPEG, TIFF), and office documents (PPT, PPTX, DOC, DOCX, XLS, XLSX). Required if url is not provided.
url
string
Presigned URL of the document to process. The URL must be publicly accessible or a valid presigned URL from cloud storage (S3, GCS, Azure Blob, etc.). Supported formats: PDF, images (PNG, JPEG, TIFF), and office documents (PPT, PPTX, DOC, DOCX, XLS, XLSX). Required if file is not provided.
use_high_resolution
boolean
Whether to use high-resolution images for cropping and post-processing. (Latency penalty: ~2-3 seconds per page). Default: false
segmentation_method
string
Document segmentation strategy:
  • "smart_layout_detection" (default): Analyzes pages for layout elements using bounding boxes
  • "page_by_page": Treats each page as a single segment
ocr_mode
string
OCR processing strategy:
  • "auto_ocr" (default): Automatically determine when to use OCR
  • "full_ocr": Process all text elements with OCR
ocr_engine
string
OCR engine selection:
  • "UnsiloedHawk" (default): Higher accuracy, better for complex layouts
  • "UnsiloedStorm": Fast processing, optimized for general documents
merge_tables
boolean
Whether to merge tables that span across multiple pages into a single unified table structure. When enabled, consecutive table segments with matching headers will be consolidated. Default: false
keep_segment_types
string
Filter output to include only specific segment types. Accepts a comma-separated list of segment types or “all” to include everything. Examples: "table", "picture", "table,picture", "table,formula". Default: “all”Available segment types:
  • table: Tabular data segments
  • picture: Image and graphic segments
  • formula: Mathematical equations
  • text: Regular text content
  • sectionheader: Section headers
  • title: Document titles
  • listitem: List items
  • caption: Image captions
  • footnote: Footnotes
  • pageheader: Page headers
  • pagefooter: Page footers
output_fields
string
JSON configuration object to control which fields are included in the response. By default, all fields are included. Set fields to false to exclude them and reduce response size. Example: {"html": false, "markdown": true, "ocr": false}. Available fields: html, markdown, ocr, image, llm, content, bbox, confidence, embed.
segment_analysis
string
JSON configuration object to customize how different segment types are processed. Allows you to control HTML/Markdown generation strategies, specify which field should populate the content field for each segment type, and configure the AI model for table processing. Example: {"Table": {"html": "LLM", "markdown": "LLM", "content_source": "HTML", "model_id": "us_table_v2"}}.

Parameter Details

File Input Options

The API supports two methods for providing the document to process:
  1. Direct File Upload (file parameter): Upload the document file directly as multipart/form-data
  2. Presigned URL (url parameter): Provide a publicly accessible URL or presigned URL to the document
Important Notes:
  • You must provide either file or url, but not both
  • When using url, the document will be downloaded from the provided URL before processing
  • Presigned URLs are ideal for documents already stored in cloud storage (S3, GCS, Azure Blob, etc.)
  • The URL must be publicly accessible or include necessary authentication parameters (e.g., S3 presigned URLs with signatures)
  • Supported formats are the same for both methods: PDF, images (PNG, JPEG, TIFF), and office documents (PPT, PPTX, DOC, DOCX, XLS, XLSX)
Use Cases for Presigned URLs:
  • Documents already stored in cloud storage
  • Avoiding duplicate file uploads
  • Integration with existing document management systems
  • Processing large files without upload overhead

Segmentation Method

The segmentation_method parameter controls how the document is analyzed and segmented:
  • "smart_layout_detection" (default): Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking for complex documents.
  • "page_by_page": Treats each page as a single segment. Faster processing, ideal for simple documents without complex layouts.

OCR Mode

The ocr_mode parameter controls optical character recognition processing:
  • "auto_ocr" (default): Intelligently determines when OCR is needed based on the document content. Balances accuracy and performance.
  • "full_ocr": Applies OCR to all text elements in the document. Use this for scanned documents or when maximum text extraction is required.

OCR Engine Selection

Select the OCR engine for text recognition:
  • “UnsiloedStorm” (default): Fast processing, optimized for general documents
  • “UnsiloedHawk”: Higher accuracy, better for complex layouts and multilingual content
Engine Comparison:
  • UnsiloedStorm:
    • Fast but less accurate
    • Faster processing time
    • Good for standard documents
    • Optimized for English text
    • Lower resource usage
  • UnsiloedHawk:
    • Slow but more accurate
    • Higher accuracy
    • Better handling of complex layouts
    • Superior multilingual support
    • Longer processing time

Table Merging

The merge_tables parameter enables intelligent merging of tables that span across multiple pages: How It Works:
  • Analyzes consecutive table segments across pages
  • Identifies tables with matching column headers
  • Merges them into a single unified table structure
  • Preserves table formatting and data integrity
When to Use:
  • Multi-Page Financial Statements: Consolidate P&L statements or balance sheets spanning multiple pages
  • Large Data Tables: Merge inventory lists, transaction records, or data sets split across pages
  • Reports with Continuation Tables: Automatically combine tables marked with “continued on next page”
Example:
{
  "merge_tables": true
}
Benefits:
  • Simplified Data Processing: Work with complete tables instead of fragments
  • Better Context: Maintain full table context for analysis and extraction
  • Reduced Post-Processing: Eliminates need for manual table stitching

Content Type Filtering

The keep_segment_types parameter allows you to filter the output to include only specific segment types, reducing response size and focusing on relevant content: How It Works:
  • Accepts a comma-separated list of segment types (case-insensitive)
  • Filters segments after processing is complete
  • Removes chunks that have no segments after filtering
Available Options:
  • "all" (default): Include all segment types
  • "table": Only table segments
  • "picture": Only image/graphic segments
  • "table,picture": Tables and pictures only
  • "table,formula": Tables and formulas only
  • Custom combinations using any segment type
Supported Segment Types:
  • table, picture, formula, text, sectionheader, title, listitem, caption, footnote, pageheader, pagefooter
Example Usage:
{
  "keep_segment_types": "table,picture"
}
Use Cases:
  • Tables Only: Extract only tabular data from financial documents
  • Pictures Only: Extract charts, graphs, and diagrams for visual analysis
  • Tables + Pictures: Get structured data and visualizations, skip text content
  • Custom Combinations: Mix any segment types based on your needs
Benefits:
  • Reduced Response Size: Filter out unwanted content before receiving results
  • Faster Processing: Less data to transfer and parse
  • Focused Extraction: Get only the content types you need
  • Cost Optimization: Smaller responses reduce bandwidth usage

Output Fields Configuration

The output_fields parameter allows you to control which fields are included in the API response. This is useful for reducing response size, improving performance, and optimizing bandwidth usage when you don’t need all available data. Available Fields:
  • html (default: true): Include HTML representation of segments
  • markdown (default: true): Include Markdown representation of segments
  • ocr (default: true): Include OCR results with bounding boxes and confidence scores
  • image (default: true): Include cropped segment images (base64 encoded)
  • llm (default: true): Include LLM-generated content and descriptions
  • content (default: true): Include text content of segments
  • bbox (default: true): Include bounding box coordinates
  • confidence (default: true): Include confidence scores for segments
  • embed (default: true): Include embed text in chunk responses
Usage: Set fields to false to exclude them from the response. Fields not specified default to true for backward compatibility. Example Configuration:
{
  "html": false,
  "markdown": true,
  "ocr": false,
  "image": false,
  "llm": false,
  "content": true,
  "bbox": true,
  "confidence": false,
  "embed": true
}
Benefits:
  • Reduced Response Size: Excluding large fields like image and html can significantly reduce payload size
  • Faster Processing: Less data to serialize and transfer
  • Cost Optimization: Smaller responses reduce bandwidth costs
  • Selective Data: Only retrieve the fields you need for your use case
When to Use:
  • Minimal Response: Set most fields to false when you only need basic content
  • Text-Only Processing: Exclude image, ocr, and llm when processing text content
  • Embedding Generation: Include only content and embed when generating embeddings
  • Full Analysis: Keep all fields enabled (default) for comprehensive document analysis

Segment Analysis Configuration

The segment_analysis parameter allows you to customize how different segment types are processed, including HTML/Markdown generation strategies and which field should populate the content field. Available Segment Types: You can configure processing for any of the following segment types:
  • Table: Tabular data segments
  • Picture: Image and graphic segments
  • Formula: Mathematical equations
  • Title: Document titles
  • SectionHeader: Section headers
  • Text: Regular text content
  • ListItem: List items
  • Caption: Image captions
  • Footnote: Footnotes
  • PageHeader: Page headers
  • PageFooter: Page footers
  • Page: Full page segments
Configuration Options: For each segment type, you can specify:
  • html: Generation strategy for HTML representation
    • "Auto" (default): Automatically determine the best method
    • "LLM": Use LLM to generate HTML
  • markdown: Generation strategy for Markdown representation
    • "Auto" (default): Automatically determine the best method
    • "LLM": Use LLM to generate Markdown
  • content_source: Defines which field should populate the content field in the response
    • "OCR" (default): Use OCR text for content
    • "HTML": Use HTML representation as content
    • "Markdown": Use Markdown representation as content
  • model_id (Table segments only): Specifies which AI model to use for table processing
    • "us_table_v1": Standard table processing model
    • "us_table_v2": Enhanced table processing model with improved accuracy
Example Configuration:
{
  "Table": {
    "html": "LLM",
    "markdown": "LLM",
    "content_source": "HTML",
    "model_id": "us_table_v2"
  },
  "Picture": {
    "html": "LLM",
    "markdown": "LLM",
    "content_source": "Markdown"
  }
}
How content_source Works: The content_source parameter determines which field’s value will be used to populate the content field in the segment response:
  • When content_source is set to "HTML", the content field will contain the HTML representation, and the separate html and markdown fields will be empty
  • When content_source is set to "Markdown", the content field will contain the Markdown representation, and the separate html and markdown fields will be empty
  • When content_source is set to "OCR" (default), the content field contains OCR text, and html and markdown fields are populated separately
  • When content_source is set to "LLM", the content field contains LLM-generated content
Use Cases:
  • HTML as Content: Set content_source: "HTML" for Table segments when you want HTML-formatted table data directly in the content field
  • Markdown as Content: Set content_source: "Markdown" for Picture segments when you want Markdown-formatted descriptions in the content field
  • LLM-Enhanced Content: Use "LLM" for both html/markdown generation strategies and set content_source: "LLM" to get AI-enhanced content in the content field

Response

job_id
string
Unique identifier for the parsing job
status
string
Initial job status (typically “Starting”)
file_name
string
Name of the uploaded file
created_at
string
Timestamp when the job was created
message
string
Status message about the job creation
quota_remaining
number
Remaining page quota for the API key
merge_tables
boolean
Whether table merging is enabled for this job
keep_segment_types
string
Segment types filter applied to this job (stored in metadata)

Document Analysis Features

The parsing endpoint provides comprehensive document analysis including:

Text Extraction

Extracts text content with high accuracy, preserving formatting and structure.

Image Recognition

Identifies and analyzes images within documents, providing descriptions and metadata.

Table Parsing

Extracts tabular data with proper structure and formatting.

OCR Processing

Performs optical character recognition on text elements with confidence scores.

Section Detection

Automatically identifies different document sections like headers, body text, and captions.

Bounding Box Information

Provides precise coordinates for all extracted elements.

Advanced Content Processing

  • LLM-Enhanced Analysis: Uses language models for better content understanding
  • Multi-Format Output: Generates HTML, Markdown, and plain text versions
  • Context-Aware Processing: Maintains document context across segments
  • Intelligent Chunking: Creates semantically meaningful document chunks
curl -X 'POST' \
  'https://prod.visionapi.unsiloed.ai/parse' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@document.pdf;type=application/pdf' \
  -F 'use_high_resolution=true' \
  -F 'segmentation_method=smart_layout_detection' \
  -F 'ocr_mode=auto_ocr' \
  -F 'ocr_engine=UnsiloedHawk' \
  -F 'merge_tables=true' \
  -F 'keep_segment_types=all' \
  -F 'segment_analysis={"Table":{"html":"LLM","markdown":"LLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'
{
  "job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
  "status": "Starting",
  "file_name": "document.pdf",
  "created_at": "2025-07-18T10:42:10.545832520Z",
  "message": "Job created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "quota_remaining": 23700,
  "merge_tables": false
}

Retrieving Results

After the job is created, use the GET /parse/ endpoint to check status and retrieve results:
cURL
curl -X 'GET' \
  'https://prod.visionapi.unsiloed.ai/parse/{job_id}' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key'
Python
import requests
import time

def get_parse_results(job_id, api_key):
    """Monitor job and retrieve results when complete"""
    
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"
    
    # Poll for completion
    while True:
        response = requests.get(status_url, headers=headers)
        
        if response.status_code == 200:
            status_data = response.json()
            print(f"Job Status: {status_data['status']}")
            
            if status_data['status'] == 'Succeeded':
                return status_data  # Results are included in the same response
                    
            elif status_data['status'] == 'Failed':
                raise Exception(f"Job failed: {status_data.get('message', 'Unknown error')}")
                
        time.sleep(5)  # Check every 5 seconds

# Usage
job_id = "e77a5c42-4dc1-44d0-a30e-ed191e8a8908"
results = get_parse_results(job_id, "your-api-key")

Expected Results Structure

When the job completes successfully, the response contains comprehensive document analysis with enhanced processing:
{
  "job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
  "status": "Succeeded",
  "created_at": "2025-10-22T06:51:16.870302Z",
  "started_at": "2025-10-22T06:51:16.966136Z",
  "finished_at": "2025-10-22T06:57:19.821541Z",
  "total_chunks": 25,
  "chunks": [
    {
      "segments": [
        {
          "segment_type": "Title",
          "content": "Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
          "image": null,
          "page_number": 1,
          "segment_id": "cc5f8dff-31be-4ccf-885d-4f9062fcee17",
          "confidence": 0.90187776,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "html": "<h1>Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)</h1>",
          "markdown": "# Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
          "bbox": {
            "left": 72.92226,
            "top": 62.030334,
            "width": 230.36308,
            "height": 55.395317
          },
          "ocr": [
            {
              "bbox": {
                "left": 63.753525,
                "top": 5.395447,
                "width": 164.45312,
                "height": 42.757812
              },
              "text": "Disinvestment",
              "confidence": 0.9999992
            }
          ]
        },
        {
          "segment_type": "Text",
          "content": "Background and context information about the disinvestment process...",
          "image": null,
          "page_number": 1,
          "segment_id": "9d60e48b-77ba-4a23-a0ac-95ee13c615ec",
          "confidence": 0.88558982,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "html": "<p>Background and context information about the disinvestment process...</p>",
          "markdown": "Background and context information about the disinvestment process...",
          "bbox": {
            "left": 486.9685,
            "top": 139.61847,
            "width": 241.29932,
            "height": 48.451706
          },
          "ocr": [
            {
              "bbox": {
                "left": 50.9729,
                "top": 3.4557495,
                "width": 46.046875,
                "height": 19.734375
              },
              "text": "Background",
              "confidence": 0.99999654
            }
          ]
        }
      ]
    }
  ]
}

Segment Types

The parsing API identifies and processes different types of document segments with enhanced processing:

Picture

Images and graphics within the document, including logos, charts, and illustrations. Enhanced with LLM-based description generation.

SectionHeader

Document headers and titles that define section boundaries. Processed with semantic understanding.

Text

Regular text content including paragraphs, sentences, and individual text elements. Enhanced with context-aware processing.

Table

Tabular data with structured rows and columns. Enhanced with LLM-based formatting and extended context options. You can configure the table processing model using model_id in the segment_analysis parameter:
  • us_table_v1: Standard table processing model
  • us_table_v2: Enhanced table processing model with improved accuracy

Caption

Text captions associated with images or figures. Processed with relationship awareness.

Formula

Mathematical equations and expressions. Enhanced with specialized formula processing.

Title

Document titles and main headings. Processed with enhanced formatting.

Footnote

Document footnotes and references. Processed with context linking.

ListItem

Bulleted and numbered list items. Processed with structure preservation. Each segment includes detailed metadata such as confidence scores, bounding boxes, OCR data, and formatted output in both HTML and Markdown with LLM enhancement.

Configuration Best Practices

For High-Accuracy Processing

Use this configuration when accuracy is critical:
{
  "use_high_resolution": true,
  "segmentation_method": "smart_layout_detection",
  "ocr_mode": "full_ocr",
  "ocr_engine": "UnsiloedHawk",
  "merge_tables": true,
  "segment_analysis": {
    "Table": {
      "html": "LLM",
      "markdown": "LLM",
      "extended_context": true,
      "crop_image": "All",
      "model_id": "us_table_v2"
    }
  }
}

For Fast Processing

Use this configuration when speed is prioritized:
{
  "use_high_resolution": false,
  "segmentation_method": "page_by_page",
  "ocr_mode": "auto_ocr",
  "ocr_engine": "UnsiloedStorm"
}

For Financial Documents (Tables + Charts)

Extract only tables and charts from financial reports:
{
  "merge_tables": true,
  "keep_segment_types": "table,picture",
  "segmentation_method": "smart_layout_detection",
  "ocr_engine": "UnsiloedHawk",
  "segment_analysis": {
    "Table": {
      "html": "LLM",
      "markdown": "LLM",
      "model_id": "us_table_v2"
    }
  }
}

For Data Extraction Only (Tables)

Extract only tabular data with minimal response size:
{
  "merge_tables": true,
  "keep_segment_types": "table",
  "output_fields": {
    "html": true,
    "markdown": false,
    "ocr": false,
    "image": false,
    "content": true,
    "bbox": false,
    "confidence": false
  }
}

OCR Engine Selection Guide

Choose the appropriate OCR engine based on your document characteristics: Use UnsiloedStorm when:
  • Processing standard business documents
  • Speed is prioritized over accuracy
  • Working with primarily English text
  • Processing large volumes of documents
  • Resource efficiency is important
  • You can tolerate some noise in the output for faster processing
Use UnsiloedHawk when:
  • High accuracy is critical
  • Working with multilingual documents
  • Processing complex layouts
  • Quality over speed is preferred
  • You need clean, accurate text extraction

Output Fields Optimization (Optional)

Optimize response size and performance by selectively including only the fields you need: For Minimal Response Size:
{
  "output_fields": {
    "html": false,
    "markdown": false,
    "ocr": false,
    "image": false,
    "llm": false,
    "content": true,
    "bbox": false,
    "confidence": false,
    "embed": true
  }
}
For Text-Only Processing:
{
  "output_fields": {
    "html": false,
    "markdown": true,
    "ocr": false,
    "image": false,
    "llm": false,
    "content": true,
    "bbox": true,
    "confidence": false,
    "embed": true
  }
}
For Full Analysis (Default): Omit output_fields or set all fields to true to include all available data.

Error Handling

Common Error Scenarios

  1. Invalid API Key: Authentication failed
  2. File Too Large: File exceeds size limits
  3. Invalid Configuration: Malformed processing parameters
  4. Server Error: Internal processing error
  5. Processing Timeout: Task took too long to complete
  6. Missing File or URL: Neither file nor url parameter provided
  7. Both File and URL Provided: Cannot provide both file and url simultaneously
  8. Invalid URL: URL is not accessible or malformed
  9. URL Download Failed: Unable to download document from provided URL

Authorizations

api-key
string
header
required

Body

multipart/form-data
file
file
required

Supported file types: PDFs, Images (PNG, JPEG, TIFF, BMP) and Office Documents (DOCX, XLSX, PPTX)

use_high_resolution
boolean
default:false

Whether to use high-resolution images for cropping and post-processing (default: false)

segmentation_method
enum<string>
default:smart_layout_detection

Document segmentation strategy

Available options:
smart_layout_detection,
page_by_page
ocr_mode
enum<string>
default:auto_ocr

OCR processing strategy

Available options:
auto_ocr,
full_ocr
ocr_engine
enum<string>
default:UnsiloedHawk

OCR engine selection: 'UnsiloedHawk' (higher accuracy) or 'UnsiloedStorm' (faster processing)

Available options:
UnsiloedHawk,
UnsiloedStorm

Response

200 - application/json

Successful response

job_id
string

Unique identifier for the parsing job

status
string

Initial job status (typically 'Starting')

file_name
string

Name of the uploaded file

created_at
string

Timestamp when the job was created

message
string

Status message about the job creation

quota_remaining
number

Remaining page quota for the API key

merge_tables
boolean

Whether table merging is enabled