` in HTML, distinct from `SectionHeader` which uses `##`/`

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Element Types

> The segment types returned by the /parse endpoint, with example response shapes pulled from real parsed documents.

Every segment in a parsed document carries a `segment_type` field naming the layout region it came from. The parser recognizes the types listed below, divided into **text elements** (regions whose meaning lives in their characters and structure) and **visual elements** (regions whose meaning lives in their layout, image content, or rendered form). Two of them (`KeyValuePair` and `Signature`) only appear when you submit with `layout_analysis=advanced_layout_detection`.

All segments share the same core fields: `bbox`, `confidence`, `content`, `markdown`, `html`, `ocr`, and location metadata. What changes by type is what those fields contain, and a couple of types omit specific fields entirely. The sections below show a real response sample for each type.

## Text Elements

These segments carry their meaning in text, so the `markdown` and `html` fields use semantic markup like headers, italics, list syntax, and footnote references to reflect each type.

### Text

Regular paragraph and inline text. The `content` field carries the plain text, `markdown` is the same with line breaks preserved, and `html` wraps any line breaks in `<br/>`.

```json theme={null}
{
  "segment_id": "13d55851-d0fc-4999-a508-ab82d9a64443",
  "segment_type": "Text",
  "content": "The following table summarises regional sales performance for Q1\n2024.",
  "markdown": "The following table summarises regional sales performance for Q1\n2024.",
  "html": "The following table summarises regional sales performance for Q1<br />\n2024.",
  "bbox": { "left": 60.8, "top": 152.6, "width": 714.6, "height": 24.3 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.99
}
```

### Title

Document titles and main headings. Rendered as a top-level Markdown header (`#`) and `<h1>` in HTML, distinct from `SectionHeader` which uses `##`/`<h2>`.

```json theme={null}
{
  "segment_id": "24063562-721c-4122-87f2-c376ac0296f0",
  "segment_type": "Title",
  "content": "BERKSHIRE HATHAWAY INC.",
  "markdown": "# BERKSHIRE HATHAWAY INC.",
  "html": "<h1>BERKSHIRE HATHAWAY INC.</h1>",
  "bbox": { "left": 308.5, "top": 625.8, "width": 761.0, "height": 70.5 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.49
}
```

### SectionHeader

Section titles and subheadings that define the document's hierarchy. The parser renders these as `##` in `markdown` and `<h2>` in `html`.

```json theme={null}
{
  "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
  "segment_type": "SectionHeader",
  "content": "Q1 2024 Sales Report",
  "markdown": "## Q1 2024 Sales Report",
  "html": "<h2>Q1 2024 Sales Report</h2>",
  "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.35
}
```

### ListItem

Bulleted and numbered list entries. The `markdown` field renders the item with a leading dash, and `html` wraps the entry in `<ul>` (with a nested `<ol>` if the source list was numbered).

```json theme={null}
{
  "segment_id": "32217825-2ddc-4121-be70-2b3e23e2ab97",
  "segment_type": "ListItem",
  "content": "1. Operating Conditions — 1966",
  "markdown": "- 1. Operating Conditions — 1966",
  "html": "<ul><li><ol>\n<li>Operating Conditions — 1966</li>\n</ol></li></ul>",
  "bbox": { "left": 441.9, "top": 430.0, "width": 288.8, "height": 20.7 },
  "page_number": 3,
  "page_width": 1222.0,
  "page_height": 1576.0,
  "confidence": 0.95
}
```

### Caption

Text captions associated with images, figures, or tables. The `markdown` field wraps the caption in italics (`_..._`), and `html` wraps it in a `<span class="caption">` for downstream styling.

```json theme={null}
{
  "segment_id": "63d646e8-0cb1-4325-8759-86625a51b0f9",
  "segment_type": "Caption",
  "content": "Figure 1: The Transformer - model\narchitecture.",
  "markdown": "_Figure 1: The Transformer - model\narchitecture._",
  "html": "<span class=\"caption\">Figure 1: The Transformer - model<br />\narchitecture.</span>",
  "bbox": { "left": 418.8, "top": 808.4, "width": 385.1, "height": 21.0 },
  "page_number": 3,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}
```

### Footnote

Footnote text and references. The `markdown` field uses Markdown footnote syntax (`[^...]`), and `html` wraps the body in a `<span class="footnote">`.

```json theme={null}
{
  "segment_id": "840069a9-fb5e-4dcb-ad37-459bd4ff29f1",
  "segment_type": "Footnote",
  "content": "∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention...",
  "markdown": "[^∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention...]",
  "html": "<span class=\"footnote\">∗Equal contribution. Listing order is random...</span>",
  "bbox": { "left": 214.4, "top": 1196.5, "width": 793.9, "height": 176.2 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}
```

### PageHeader

Header content at the top of a page, such as library stamps, document titles repeating across pages, or running headers. The `markdown` and `html` fields carry the raw text without semantic markup. Often worth filtering out for clean RAG ingestion.

```json theme={null}
{
  "segment_id": "0f1fce17-da69-4698-b7a6-3abf954ee41e",
  "segment_type": "PageHeader",
  "content": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.\nCORPORATION FILE",
  "markdown": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.\nCORPORATION FILE",
  "html": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.<br />\nCORPORATION FILE",
  "bbox": { "left": 998.3, "top": 2.6, "width": 193.5, "height": 76.7 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.34
}
```

### PageFooter

Footer content at the bottom of a page, typically page numbers, copyright notices, or document IDs. Like `PageHeader`, often filtered out before embedding.

```json theme={null}
{
  "segment_id": "997cbb53-0d58-49dc-b10e-e997ee14aafc",
  "segment_type": "PageFooter",
  "content": "5",
  "markdown": "5",
  "html": "5",
  "bbox": { "left": 611.0, "top": 1410.1, "width": 12.4, "height": 21.6 },
  "page_number": 7,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.94
}
```

### KeyValuePair

A labeled field in a form or document, like `Passport No :` or `Invoice Date:`. Only returned under `layout_analysis=advanced_layout_detection`. The label is captured in this segment; the value typically appears as a separate adjacent `Text` segment. The `html` field wraps the label in a `<div class="key-value-pair">` so downstream code can style or pair it.

```json theme={null}
{
  "segment_id": "85fcc4f1-f637-4d57-a698-c4bc4bfb0d3e",
  "segment_type": "KeyValuePair",
  "content": "Passport No :",
  "markdown": "Passport No :",
  "html": "<div class=\"key-value-pair\">Passport No :</div>",
  "bbox": { "left": 95.5, "top": 217.9, "width": 110.4, "height": 22.2 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 1.0
}
```

## Visual Elements

These segments carry their meaning in visual content or layout. The `markdown` and `html` fields contain either rendered structured content (Markdown tables, LaTeX) or AI-generated descriptions for image regions.

### Table

Tabular data with structured rows and columns. The `markdown` field carries the Markdown pipe-table syntax, `html` carries a full `<table>` with `<thead>` and `<tbody>`, and `content` is a flat plain-text approximation. The `image` field contains a signed URL to a cropped image of the table region, useful for verifying parses visually or feeding the original table to an image-input model.

```json theme={null}
{
  "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
  "segment_type": "Table",
  "content": "Region Sales Rep Units Sold Revenue ($) Target ($) % of Target\nNorth Alice Brown 1,240 186,000 175,000 106%\n...",
  "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) | Target ($) | % of Target |\n| --- | --- | --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 | 175,000 | 106% |\n| ... | ... | ... | ... | ... | ... |",
  "html": "<table>\n  <thead>\n    <tr><th>Region</th><th>Sales Rep</th><th>Units Sold</th>...</tr>\n  </thead>\n  <tbody>\n    <tr><td>North</td><td>Alice Brown</td>...</tr>\n    ...\n  </tbody>\n</table>",
  "image": "https://s3.us-east-1.amazonaws.com/...",
  "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.99
}
```

### Picture

Images, charts, illustrations, and diagrams. The `image` field contains a signed URL to the cropped picture itself. The `markdown` and `html` fields contain an AI-generated description of the image (not the image bytes), making the picture's visual content searchable and embeddable as text alongside the rest of the document.

```json theme={null}
{
  "segment_id": "c60d89b1-373e-428d-9950-544e7c903b61",
  "segment_type": "Picture",
  "markdown": "# Image Description\n\nThe image shows a **large orange sombrero** against a **plain white background**. The hat has a tall, rounded crown and a very broad brim...",
  "html": "<h1>Image Description</h1><p>The image shows a <strong>large orange sombrero</strong>...</p>",
  "image": "https://s3.us-east-1.amazonaws.com/...",
  "bbox": { "left": 0.5, "top": -1.0, "width": 1748.5, "height": 1166.9 },
  "page_number": 2,
  "page_width": 1732.0,
  "page_height": 2262.0,
  "confidence": 0.93
}
```

### Formula

Mathematical equations and expressions. The most distinctive type: the `markdown` and `html` fields contain LaTeX wrapped in `$...$`, ready to render with KaTeX, MathJax, or any other LaTeX-aware tool. The `content` field carries a plain-text OCR approximation of the equation, which is usually less reliable than the LaTeX representation.

```json theme={null}
{
  "segment_id": "b4ccd4cf-01ae-4e92-b881-3bb1c335e8b3",
  "segment_type": "Formula",
  "content": "V ) = softmax(QKT )V (1)\nAttention(Q, K, √\ndk",
  "markdown": "$\\mathrm{Attention}(Q, K, V) = \\mathrm{softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$",
  "html": "<p>$\\mathrm{Attention}(Q, K, V) = \\mathrm{softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$</p>",
  "bbox": { "left": 438.4, "top": 928.8, "width": 569.9, "height": 52.7 },
  "page_number": 4,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}
```

### Signature

A handwritten signature region. Only returned under `layout_analysis=advanced_layout_detection`. Like `Picture`, the `markdown` and `html` fields contain an AI-generated description of what the handwriting looks like, useful as searchable text. Unlike `Picture`, a Signature segment carries no `content` field and no `image` URL, only the description and bounding box.

```json theme={null}
{
  "segment_id": "b328b32d-1c37-41f2-a5f7-0366a870d238",
  "segment_type": "Signature",
  "markdown": "## Image Description\n\nThe image shows a handwritten word in dark ink on a light background.\n\n### Visible Text\n- **Dhote.**\n\n### Details\n- The handwriting is cursive and slightly slanted...",
  "html": "<h2>Image Description</h2>\n<p>The image shows a handwritten word in dark ink on a light background.</p>\n<h3>Visible Text</h3>\n<ul><li><strong>Dhote.</strong></li></ul>...",
  "bbox": { "left": 96.7, "top": 1398.4, "width": 84.2, "height": 50.8 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 1.0
}
```

For the full segment shape and configuration options, see the [Parse API reference](/api-reference/parser/parse-document).