Response Format - Unsiloed AI

A successful /parse job returns the document organized into chunks. Each chunk has an embed Markdown string (concatenated content from its segments, ready for embedding) and an array of segments with bounding boxes and metadata. The example below is a real response from a single-page test document.

{
  "job_id": "a0f51f79-6eb8-412a-9afa-924ddfbf9578",
  "status": "Succeeded",
  "message": "Task succeeded",
  "file_name": "document.pdf",
  "file_type": "application/pdf",
  "page_count": 1,
  "total_chunks": 1,
  "credit_used": 1,
  "merge_tables": false,
  "created_at": "2026-05-22T11:29:17.964433Z",
  "started_at": "2026-05-22T11:29:18.040527Z",
  "finished_at": "2026-05-22T11:29:32.109418Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "file_url": "https://s3.us-east-1.amazonaws.com/...",
  "configuration": {
    "layout_analysis": "smart_layout_detection",
    "ocr_engine": "UnsiloedHawk",
    "ocr_strategy": "auto_detection",
    "merge_tables": false,
    "...": "..."
  },
  "metadata": {},
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 117,
      "embed": "## Q1 2024 Sales Report\nThe following table summarises regional sales performance...",
      "segments": [
        {
          "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
          "segment_type": "SectionHeader",
          "content": "Q1 2024 Sales Report",
          "markdown": "## Q1 2024 Sales Report",
          "html": "<h2>Q1 2024 Sales Report</h2>",
          "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.35,
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "ocr": [
            { "text": "Q1", "bbox": { "left": 5.4, "top": 4.1, "width": 35.6, "height": 22.7 }, "confidence": null },
            { "text": "2024", "bbox": { "left": 56.4, "top": 4.1, "width": 69.4, "height": 22.7 }, "confidence": null }
          ],
          "references": null
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) |\n| --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 |\n| ... | ... | ... | ... |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.99,
          "ocr": [],
          "references": null
        }
      ]
    }
  ]
}

Top-Level Fields

These fall into three groups: identification, status, and timing; parsed content; and job configuration and metering.

Identification, Status, and Timing

job_id: unique identifier for the parsing job
status: job state (Succeeded, Failed, or an in-progress value such as Starting or Processing)
message: human-readable status message ("Task succeeded" when the job completes)
file_name: name of the uploaded file
file_type: MIME type of the uploaded file (e.g., application/pdf)
created_at: ISO 8601 timestamp when the job was created
started_at: ISO 8601 timestamp when processing began
finished_at: ISO 8601 timestamp when processing completed

Parsed Content

chunks: array of content chunks
total_chunks: total number of chunks
page_count: total number of pages in the document
pdf_url: temporary signed S3 URL to the processed PDF
file_url: temporary signed S3 URL to the original uploaded file

Job Configuration and Metering

configuration: the full configuration object used for this parse (OCR engine, layout strategy, segment processing settings, etc.); see the Parse API reference for every option
metadata: additional job metadata; usually an empty object
merge_tables: whether tables were merged across pages
credit_used: credits consumed by this job

Chunk Fields

chunk_id: unique identifier for the chunk
chunk_length: character length of the chunk’s embed content
embed: combined Markdown content from all segments in the chunk, ready for embedding into a vector store
segments: array of layout segments within the chunk

Segment Fields

segment_id: unique identifier for the segment
segment_type: element classification; see the Element Types reference for the full list
content: plain-text content of the segment (omitted for Signature segments)
markdown: Markdown-formatted content
html: HTML-formatted content
image: signed S3 URL to a cropped image of the segment (present for most types; omitted for Signature)
bbox: bounding box relative to the page, with left, top, width, height in points
page_number: page where the segment appears
page_width / page_height: page dimensions for coordinate reference
confidence: model confidence score (0–1) for element detection
ocr: array of word-level OCR results
references: references to related segments; typically null

OCR Item Fields

Each item in a segment’s ocr array describes one word the OCR engine recognized within that segment. The bounding box is relative to the segment’s cropped image, not the full page.

text: the recognized word or token
bbox: bounding box relative to the segment’s image, with left, top, width, height
confidence: per-word model confidence (0–1), or null when not reported
color: optional r, g, b, and hex sub-fields, present only when extract_colors: true is set in the parse configuration

Documentation Index

​Top-Level Fields

​Identification, Status, and Timing

​Parsed Content

​Job Configuration and Metering

​Chunk Fields

​Segment Fields

​OCR Item Fields