Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt

Use this file to discover all available pages before exploring further.

A successful /parse job returns the document organized into chunks. Each chunk has an embed Markdown string (concatenated content from its segments, ready for embedding) and an array of segments with bounding boxes and metadata. The example below is a real response from a single-page test document.
{
  "job_id": "a0f51f79-6eb8-412a-9afa-924ddfbf9578",
  "status": "Succeeded",
  "message": "Task succeeded",
  "file_name": "document.pdf",
  "file_type": "application/pdf",
  "page_count": 1,
  "total_chunks": 1,
  "credit_used": 1,
  "merge_tables": false,
  "created_at": "2026-05-22T11:29:17.964433Z",
  "started_at": "2026-05-22T11:29:18.040527Z",
  "finished_at": "2026-05-22T11:29:32.109418Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "file_url": "https://s3.us-east-1.amazonaws.com/...",
  "configuration": {
    "layout_analysis": "smart_layout_detection",
    "ocr_engine": "UnsiloedHawk",
    "ocr_strategy": "auto_detection",
    "merge_tables": false,
    "...": "..."
  },
  "metadata": {},
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 117,
      "embed": "## Q1 2024 Sales Report\nThe following table summarises regional sales performance...",
      "segments": [
        {
          "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
          "segment_type": "SectionHeader",
          "content": "Q1 2024 Sales Report",
          "markdown": "## Q1 2024 Sales Report",
          "html": "<h2>Q1 2024 Sales Report</h2>",
          "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.35,
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "ocr": [
            { "text": "Q1", "bbox": { "left": 5.4, "top": 4.1, "width": 35.6, "height": 22.7 }, "confidence": null },
            { "text": "2024", "bbox": { "left": 56.4, "top": 4.1, "width": 69.4, "height": 22.7 }, "confidence": null }
          ],
          "references": null
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) |\n| --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 |\n| ... | ... | ... | ... |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.99,
          "ocr": [],
          "references": null
        }
      ]
    }
  ]
}

Top-Level Fields

These fall into three groups: identification, status, and timing; parsed content; and job configuration and metering.

Identification, Status, and Timing

  • job_id: unique identifier for the parsing job
  • status: job state (Succeeded, Failed, or an in-progress value such as Starting or Processing)
  • message: human-readable status message ("Task succeeded" when the job completes)
  • file_name: name of the uploaded file
  • file_type: MIME type of the uploaded file (e.g., application/pdf)
  • created_at: ISO 8601 timestamp when the job was created
  • started_at: ISO 8601 timestamp when processing began
  • finished_at: ISO 8601 timestamp when processing completed

Parsed Content

  • chunks: array of content chunks
  • total_chunks: total number of chunks
  • page_count: total number of pages in the document
  • pdf_url: temporary signed S3 URL to the processed PDF
  • file_url: temporary signed S3 URL to the original uploaded file

Job Configuration and Metering

  • configuration: the full configuration object used for this parse (OCR engine, layout strategy, segment processing settings, etc.); see the Parse API reference for every option
  • metadata: additional job metadata; usually an empty object
  • merge_tables: whether tables were merged across pages
  • credit_used: credits consumed by this job

Chunk Fields

  • chunk_id: unique identifier for the chunk
  • chunk_length: character length of the chunk’s embed content
  • embed: combined Markdown content from all segments in the chunk, ready for embedding into a vector store
  • segments: array of layout segments within the chunk

Segment Fields

  • segment_id: unique identifier for the segment
  • segment_type: element classification; see the Element Types reference for the full list
  • content: plain-text content of the segment (omitted for Signature segments)
  • markdown: Markdown-formatted content
  • html: HTML-formatted content
  • image: signed S3 URL to a cropped image of the segment (present for most types; omitted for Signature)
  • bbox: bounding box relative to the page, with left, top, width, height in points
  • page_number: page where the segment appears
  • page_width / page_height: page dimensions for coordinate reference
  • confidence: model confidence score (0–1) for element detection
  • ocr: array of word-level OCR results
  • references: references to related segments; typically null

OCR Item Fields

Each item in a segment’s ocr array describes one word the OCR engine recognized within that segment. The bounding box is relative to the segment’s cropped image, not the full page.
  • text: the recognized word or token
  • bbox: bounding box relative to the segment’s image, with left, top, width, height
  • confidence: per-word model confidence (0–1), or null when not reported
  • color: optional r, g, b, and hex sub-fields, present only when extract_colors: true is set in the parse configuration