Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Document parsing in our Vision API analyzes document structure using advanced AI. The parsing functionality identifies different document elements like text blocks, tables, images, headers, and footers while maintaining proper reading order and extracting content with high accuracy.
Parsing capabilities are accessed through the /parse endpoint, which combines structure detection with content extraction.

Key Features

Element Detection

Identify and classify document elements using advanced AI models

Content Extraction

Extract text, tables, and images with appropriate processing methods

Reading Order

Maintain proper document flow and reading sequence

Multi-Modal Processing

Handle text, images, tables, and formulas with specialized extractors

Document Element Types

The parsing system can identify and process the following element types through chunking strategies:

Text Elements

  • Text: Regular paragraph text
  • Title: Document and section titles
  • Section-header: Section headings
  • Page-header: Header content
  • Page-footer: Footer content
  • Caption: Image and table captions
  • Footnote: Footnote references and content
  • List-item: Bulleted and numbered lists

Visual Elements

  • Table: Structured tabular data
  • Picture: Images, charts, and diagrams
  • Formula: Mathematical equations and expressions

Quick Start

Using Local files

import requests
import time

headers = {"api-key": "your-api-key"}

# Parse with merge_tables enabled
with open("document.pdf", "rb") as f:
    response = requests.post(
        "https://prod.visionapi.unsiloed.ai/parse",
        headers=headers,
        files={"file": ("document.pdf", f, "application/pdf")},
        data={"merge_tables": "true"}
    )

if response.status_code != 200:
    print(f"Error: {response.status_code} - {response.text}")
    exit(1)

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

# Poll for results
while True:
    result = requests.get(
        f"https://prod.visionapi.unsiloed.ai/parse/{job_id}",
        headers=headers
    ).json()
    print(f"Status: {result['status']}")

    if result["status"] == "Succeeded":
        break
    elif result["status"] == "Failed":
        print(f"Error: {result.get('message', 'Unknown error')}")
        exit(1)
    time.sleep(5)

print(f"Total chunks: {result['total_chunks']}")

# Process chunks
for chunk in result["chunks"]:
    for segment in chunk["segments"]:
        print(f"Type: {segment['segment_type']}")
        content = segment.get("content") or ""
        print(f"Content: {content[:100]}...")

Using Presigned URLs

Instead of uploading files directly, you can provide a presigned URL using the url parameter. This is ideal for documents already stored in cloud storage (S3, GCS, Azure Blob, etc.).
import requests
import time

headers = {"api-key": "your-api-key"}

# Using presigned URL instead of file upload
document_url = "https://lyltzyvtloozzovxrupp.supabase.co/storage/v1/object/public/pdfs/0589f42e-0684-434c-999a-beedcf34c04a/Invoice_bc11de5d-e45a-46a8-892d-32f8794ab72d.pdf"

print("Parsing document from URL...")
response = requests.post(
    "https://prod.visionapi.unsiloed.ai/parse",
    headers=headers,
    data={"url": document_url}
)

if response.status_code != 200:
    print(f"Error: {response.status_code} - {response.text}")
    exit(1)

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

# Poll for results
while True:
    result = requests.get(
        f"https://prod.visionapi.unsiloed.ai/parse/{job_id}",
        headers=headers
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "Succeeded":
        break
    if result["status"] == "Failed":
        print(f"Error: {result.get('message', 'Unknown error')}")
        exit(1)
    time.sleep(5)

# Access the parsed content
print(f"Total chunks: {result['total_chunks']}")

# Get the embed content
for chunk in result["chunks"]:
    print(f"\n--- {chunk['embed'][:100]} ---")
For detailed API parameters and configuration options, see the Parse API Reference.

Response Format

Parsed content is organized into chunks with segments containing rich metadata, bounding boxes, and extracted content:
{
  "job_id": "1699d429-9c2e-464e-b311-d4b68a8444b8",
  "status": "Succeeded",
  "file_name": "document.pdf",
  "total_chunks": 3,
  "page_count": 1,
  "created_at": "2026-01-05T15:06:27.966175Z",
  "started_at": "2026-01-05T15:06:28.130578Z",
  "finished_at": "2026-01-05T15:06:36.009842Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 118,
      "embed": "## Tax Invoice on behalf of -\n\nThe parsed content combined from all segments...",
      "segments": [
        {
          "segment_id": "c60d89b1-373e-428d-9950-544e7c903b61",
          "segment_type": "Picture",
          "markdown": "1. DESCRIPTION:\nThe image displays a logo...",
          "html": "<p>The image displays a logo...</p>",
          "image": "https://s3.us-east-1.amazonaws.com/unsiloed-bucket/...",
          "page_number": 1,
          "page_width": 595.0,
          "page_height": 842.0,
          "confidence": 0.9805617332458496,
          "bbox": {
            "left": 34.476383209228516,
            "top": 30.995285034179688,
            "width": 118.25996398925781,
            "height": 29.03614044189453
          },
          "ocr": []
        },
        {
          "segment_id": "16d38aae-9d38-4a5e-8e78-febb2f206e3d",
          "segment_type": "SectionHeader",
          "content": "Tax Invoice on behalf of -",
          "markdown": "## Tax Invoice on behalf of -",
          "html": "<h2>Tax Invoice on behalf of -</h2>",
          "page_number": 1,
          "page_width": 595.0,
          "page_height": 842.0,
          "confidence": 0.5002586245536804,
          "bbox": {
            "left": 33.08928680419922,
            "top": 103.89154052734375,
            "width": 144.82037353515625,
            "height": 11.463272094726562
          },
          "ocr": [
            {
              "bbox": {
                "left": 1.1465377807617188,
                "top": 3.37347412109375,
                "width": 19.81999969482422,
                "height": 5.54998779296875
              },
              "text": "Tax",
              "confidence": null,
              "color": {
                "r": 0,
                "g": 0,
                "b": 0,
                "hex": "#000000"
              }
            }
          ]
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Particulars | Gross value | Discount | Net value |\n| :--- | :--- | :--- | :--- |\n| 1 x Dal Tadka | 170 | 0 | 170 |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/unsiloed-bucket/...",
          "page_number": 1,
          "confidence": 0.9682685732841492,
          "bbox": {
            "left": 32.230628967285156,
            "top": 299.7548522949219,
            "width": 539.4254760742188,
            "height": 106.92483520507812
          }
        }
      ]
    }
  ],
  "metadata": {
    "segment_filter": "all"
  },
  "credit_used": 1,
  "merge_tables": false
}

Response Fields

Top-Level Fields

  • job_id: Unique identifier for the parsing job
  • status: Job status (Succeeded, Failed, Processing, etc.)
  • file_name: Name of the uploaded file
  • total_chunks: Number of content chunks generated
  • page_count: Total number of pages in the document
  • pdf_url: Temporary S3 URL to access the processed PDF
  • metadata: Additional processing metadata and settings

Chunk Fields

  • chunk_id: Unique identifier for each chunk
  • chunk_length: Character length of the chunk
  • embed: Combined markdown content from all segments in the chunk (ideal for RAG/embeddings)
  • segments: Array of document elements within the chunk

Segment Fields

  • segment_id: Unique identifier for each segment
  • segment_type: Element classification (Text, Title, Table, Picture, SectionHeader, etc.)
  • content: Plain text content (for text segments)
  • markdown: Markdown-formatted content
  • html: HTML-formatted content
  • image: S3 URL for visual elements (Picture, Table)
  • bbox: Bounding box with left, top, width, height coordinates
  • page_number: Page where the segment appears
  • page_width / page_height: Page dimensions for coordinate reference
  • confidence: AI model confidence score (0-1) for element detection
  • ocr: Array of OCR results with word-level text, bounding boxes, and colors

Use Cases

  • Document Q&A: Extract structured content for RAG applications
  • Content Migration: Parse and convert documents to markdown or HTML
  • Data Extraction: Identify and extract specific document sections
  • Archive Processing: Batch process large document collections