> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Excel Parsing

> Parse .xls and .xlsx workbooks into structured tables with cell-range references.

Spreadsheets don't behave like documents. There are no pages to lay out, no reading order to reconstruct, and no OCR to run — a workbook is a grid of typed cells spread across one or more sheets, often with hidden rows, formulas, merged ranges, and styling that carry meaning. Excel parsing has its own pipeline and its own endpoint: **`POST /parse/excel`**.

<Note>
  Excel files go to `POST /parse/excel`, not `POST /parse`. Submitting a `.xls` or `.xlsx` to `/parse` returns a `400` with `error: "excel_not_supported_here"` pointing you here. Conversely, `/parse/excel` only accepts Excel files and rejects everything else.
</Note>

## How It Differs from `/parse`

The Excel endpoint shares the same asynchronous submit-and-poll flow and the same result shape as `/parse` — you get a `job_id`, poll `GET /parse/{job_id}` until `Succeeded`, and read back chunks of segments. But none of the PDF parsing options carry over: `ocr_engine`, `layout_analysis`, `segment_processing`, `agentic_ocr`, and the [processing modes](/document-processing/parsing/processing-modes) have no effect on a workbook. Excel parsing has its own, separate set of configuration options (below).

Each sheet is converted to structured tables. The parser preserves the grid, lets you drop hidden content, controls how large tables are split, and tags every table segment with the spreadsheet cells it came from.

## Submitting a Workbook

Provide either a `file` upload or a `url` to a workbook in cloud storage — not both. Every configuration field is optional and defaults to the pipeline's own default.

<CodeGroup>
  ```python Python theme={null}
  import json
  import os
  import time
  import requests

  API_KEY = os.environ["UNSILOED_API_KEY"]
  BASE_URL = "https://prod.visionapi.unsiloed.ai"

  with open("workbook.xlsx", "rb") as f:
      response = requests.post(
          f"{BASE_URL}/parse/excel",
          headers={"api-key": API_KEY},
          files={"file": ("workbook.xlsx", f,
                          "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")},
          data={
              "exclude_hidden": "true",    # drop hidden sheets/rows/cols
              "split_large_tables": "true",
              "max_rows_per_segment": "50",
          },
      )
  response.raise_for_status()
  job_id = response.json()["job_id"]
  print(f"Job submitted: {job_id}")

  while True:
      result = requests.get(
          f"{BASE_URL}/parse/{job_id}",
          headers={"api-key": API_KEY},
      ).json()
      if result["status"] == "Succeeded":
          break
      if result["status"] == "Failed":
          raise RuntimeError(result.get("message", "parse job failed"))
      time.sleep(5)

  with open("result.json", "w") as f:
      json.dump(result, f, indent=2)
  ```

  ```javascript JavaScript theme={null}
  import fs from "node:fs";

  const API_KEY = process.env.UNSILOED_API_KEY;
  const BASE_URL = "https://prod.visionapi.unsiloed.ai";

  const form = new FormData();
  form.append("file", new Blob([fs.readFileSync("workbook.xlsx")]), "workbook.xlsx");
  form.append("exclude_hidden", "true");
  form.append("split_large_tables", "true");
  form.append("max_rows_per_segment", "50");

  const submit = await fetch(`${BASE_URL}/parse/excel`, {
    method: "POST",
    headers: { "api-key": API_KEY },
    body: form,
  });
  const { job_id } = await submit.json();
  console.log(`Job submitted: ${job_id}`);

  let result;
  while (true) {
    result = await (
      await fetch(`${BASE_URL}/parse/${job_id}`, { headers: { "api-key": API_KEY } })
    ).json();
    if (result.status === "Succeeded") break;
    if (result.status === "Failed") throw new Error(result.message ?? "parse job failed");
    await new Promise((r) => setTimeout(r, 5000));
  }
  ```

  ```bash cURL theme={null}
  # Submit
  curl -X POST "https://prod.visionapi.unsiloed.ai/parse/excel" \
    -H "api-key: $UNSILOED_API_KEY" \
    -F "file=@workbook.xlsx" \
    -F "exclude_hidden=true" \
    -F "split_large_tables=true" \
    -F "max_rows_per_segment=50"

  # Poll (substitute the job_id from the submit response)
  curl "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID" \
    -H "api-key: $UNSILOED_API_KEY"
  ```
</CodeGroup>

## Configuration

All fields are optional. Booleans are sent as the strings `"true"` / `"false"` in the multipart form.

### Hidden Content

| Parameter               | Type    | Default | What it does                                                                                  |
| ----------------------- | ------- | ------- | --------------------------------------------------------------------------------------------- |
| `exclude_hidden`        | boolean | `false` | Drop hidden sheets, rows, columns, and styling from the output (gates the four fields below). |
| `exclude_hidden_sheets` | boolean | `true`  | When excluding hidden content, also drop hidden sheets.                                       |
| `exclude_hidden_rows`   | boolean | `true`  | When excluding hidden content, also drop hidden rows.                                         |
| `exclude_hidden_cols`   | boolean | `true`  | When excluding hidden content, also drop hidden columns.                                      |
| `exclude_images`        | boolean | `false` | When excluding hidden content, also drop embedded/pasted images.                              |

### Tables

| Parameter              | Type    | Default      | What it does                                                                                                          |
| ---------------------- | ------- | ------------ | --------------------------------------------------------------------------------------------------------------------- |
| `split_large_tables`   | boolean | `true`       | Break big tables into smaller segments so each stays a manageable size.                                               |
| `max_rows_per_segment` | integer | `50`         | Maximum rows per segment when `split_large_tables` is enabled.                                                        |
| `table_clustering`     | string  | `"accurate"` | How aggressively to group adjacent ranges into tables: `accurate` (full analysis, best but slower), `fast`, or `off`. |

<Note>
  The `exclude_hidden_*` and `exclude_images` toggles only take effect when `exclude_hidden` is `true`. Leave `exclude_hidden` off to keep everything.
</Note>

## What an Excel Parse Returns

The response is the same job/chunk/segment shape as a PDF parse (see [Response Format](/document-processing/parsing/response-format)), with workbook content surfaced as `Table` segments. Each sheet's tables come back as `markdown` and `html`, and large tables are split according to `split_large_tables` / `max_rows_per_segment`.

The Excel-specific addition is **`cell_references`** on each segment:

* **`cell_references`:** spreadsheet cell-range references for the segment, each an object of `{ sheet, address, ref }` — the sheet name, the cell or range address (e.g. `Sheet1!B2:D10`), and the referenced value. This is how you trace a parsed table back to exact cells in the workbook.

<Warning>
  `cell_references` is distinct from `references`, which carries research-paper citations for document parses. On Excel segments, spreadsheet cell ranges live in `cell_references`; `references` stays `null`.
</Warning>

## Dig Deeper

<CardGroup cols={2}>
  <Card title="Response Format" icon="brackets-curly" href="/document-processing/parsing/response-format">
    The canonical job/chunk/segment shape shared with PDF parses.
  </Card>

  <Card title="Presigned URLs" icon="link" href="/document-processing/parsing/presigned-urls">
    Parse workbooks straight from cloud storage with a `url` instead of an upload.
  </Card>
</CardGroup>
