> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Response Format

> Canonical response shape from the /parse endpoint, with a field-by-field reference.

A successful `/parse` job returns the document organized into chunks. Each chunk has an `embed` Markdown string (concatenated content from its segments, ready for embedding) and an array of segments with bounding boxes and metadata. The example below is a real response from a single-page test document.

```json theme={null}
{
  "job_id": "a0f51f79-6eb8-412a-9afa-924ddfbf9578",
  "status": "Succeeded",
  "message": "Task succeeded",
  "file_name": "document.pdf",
  "file_type": "application/pdf",
  "page_count": 1,
  "total_chunks": 1,
  "credit_used": 1,
  "merge_tables": false,
  "created_at": "2026-05-22T11:29:17.964433Z",
  "started_at": "2026-05-22T11:29:18.040527Z",
  "finished_at": "2026-05-22T11:29:32.109418Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "file_url": "https://s3.us-east-1.amazonaws.com/...",
  "configuration": {
    "layout_analysis": "smart_layout_detection",
    "ocr_engine": "UnsiloedHawk",
    "ocr_strategy": "auto_detection",
    "merge_tables": false,
    "...": "..."
  },
  "metadata": {},
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 117,
      "embed": "## Q1 2024 Sales Report\nThe following table summarises regional sales performance...",
      "segments": [
        {
          "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
          "segment_type": "SectionHeader",
          "content": "Q1 2024 Sales Report",
          "markdown": "## Q1 2024 Sales Report",
          "html": "<h2>Q1 2024 Sales Report</h2>",
          "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.35,
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "ocr": [
            { "text": "Q1", "bbox": { "left": 5.4, "top": 4.1, "width": 35.6, "height": 22.7 }, "confidence": null },
            { "text": "2024", "bbox": { "left": 56.4, "top": 4.1, "width": 69.4, "height": 22.7 }, "confidence": null }
          ],
          "references": null
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) |\n| --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 |\n| ... | ... | ... | ... |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.99,
          "ocr": [],
          "references": null
        }
      ]
    }
  ]
}
```

## Top-Level Fields

These fall into three groups: identification, status, and timing; parsed content; and job configuration and metering.

### Identification, Status, and Timing

* **`job_id`:** unique identifier for the parsing job
* **`status`:** job state (`Succeeded`, `Failed`, or an in-progress value such as `Starting` or `Processing`)
* **`message`:** human-readable status message (`"Task succeeded"` when the job completes)
* **`file_name`:** name of the uploaded file
* **`file_type`:** MIME type of the uploaded file (e.g., `application/pdf`)
* **`created_at`:** ISO 8601 timestamp when the job was created
* **`started_at`:** ISO 8601 timestamp when processing began
* **`finished_at`:** ISO 8601 timestamp when processing completed

### Parsed Content

* **`chunks`:** array of content chunks
* **`total_chunks`:** total number of chunks
* **`page_count`:** total number of pages in the document
* **`pdf_url`:** temporary signed S3 URL to the processed PDF, or `null` unless `include_url=true` (see [URL fields](#url-fields) below)
* **`file_url`:** temporary signed S3 URL to the original uploaded file, or `null` unless `include_url=true`

### Job Configuration and Metering

* **`configuration`:** the full configuration object used for this parse (OCR engine, layout strategy, segment processing settings, etc.); see the [Parse API reference](/api-reference/parser/parse-document) for every option
* **`metadata`:** additional job metadata; usually an empty object
* **`merge_tables`:** whether tables were merged across pages
* **`credit_used`:** credits consumed by this job

## Chunk Fields

* **`chunk_id`:** unique identifier for the chunk
* **`chunk_length`:** character length of the chunk's `embed` content
* **`embed`:** combined Markdown content from all segments in the chunk, ready for embedding into a vector store
* **`segments`:** array of layout segments within the chunk

## Segment Fields

* **`segment_id`:** unique identifier for the segment
* **`segment_type`:** element classification; see the [Element Types](/document-processing/parsing/element-types) reference for the full list
* **`content`:** plain-text content of the segment (omitted for `Signature` segments)
* **`markdown`:** Markdown-formatted content
* **`html`:** HTML-formatted content
* **`image`:** signed S3 URL to a cropped image of the segment (present for most types; omitted for `Signature`), or `null` unless `include_url=true` (see [URL fields](#url-fields) below)
* **`bbox`:** bounding box relative to the page, with `left`, `top`, `width`, `height` in render pixels
* **`page_number`:** page where the segment appears
* **`page_width` / `page_height`:** dimensions in pixels of the rendered page the bounding boxes are measured against; use the ratio of `page_width` to the page's width in PDF points to convert coordinates back to points
* **`confidence`:** model confidence score (0–1) for element detection
* **`ocr`:** array of word-level OCR results
* **`references`:** references to related segments; typically `null`

## OCR Item Fields

Each item in a segment's `ocr` array describes one word the OCR engine recognized within that segment. The bounding box is relative to the segment's cropped image, not the full page.

* **`text`:** the recognized word or token
* **`bbox`:** bounding box relative to the segment's image, with `left`, `top`, `width`, `height`
* **`confidence`:** per-word model confidence (0–1), or `null` when not reported
* **`color`:** optional `r`, `g`, `b`, and `hex` sub-fields, present only when `extract_colors: true` is set in the parse configuration

## URL Fields

By default, every file URL in the response is returned as `null` so the response (and any log that captures it) never exposes your storage bucket, region, or path. The gated fields are:

* `pdf_url`
* `file_url`
* `output_file_url`
* `exports` (presigned export download URLs)
* segment `image` (cropped segment images)
* `configuration.input_file_url`

To receive the real URLs, opt in when polling for results with either the `include_url=true` query parameter or the `include-url: true` header on `GET /parse/{job_id}`:

<CodeGroup>
  ```bash cURL theme={null}
  curl "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID?include_url=true" \
    -H "api-key: $UNSILOED_API_KEY"
  ```

  ```python Python theme={null}
  result = requests.get(
      f"{BASE_URL}/parse/{job_id}",
      params={"include_url": "true"},
      headers={"api-key": API_KEY},
  ).json()
  ```
</CodeGroup>

<Note>
  `include_url` does not rewrite or re-sign URLs; when set to `true` they are returned exactly as generated. Presigned URLs are time-limited, so fetch any files you need promptly.
</Note>
