POST /parse/excel.
Excel files go to
POST /parse/excel, not POST /parse. Submitting a .xls or .xlsx to /parse returns a 400 with error: "excel_not_supported_here" pointing you here. Conversely, /parse/excel only accepts Excel files and rejects everything else.How It Differs from /parse
The Excel endpoint shares the same asynchronous submit-and-poll flow and the same result shape as /parse — you get a job_id, poll GET /parse/{job_id} until Succeeded, and read back chunks of segments. But none of the PDF parsing options carry over: ocr_engine, layout_analysis, segment_processing, agentic_ocr, and the processing modes have no effect on a workbook. Excel parsing has its own, separate set of configuration options (below).
Each sheet is converted to structured tables. The parser preserves the grid, lets you drop hidden content, controls how large tables are split, and tags every table segment with the spreadsheet cells it came from.
Submitting a Workbook
Provide either afile upload or a url to a workbook in cloud storage — not both. Every configuration field is optional and defaults to the pipeline’s own default.
Configuration
All fields are optional. Booleans are sent as the strings"true" / "false" in the multipart form.
Hidden Content
| Parameter | Type | Default | What it does |
|---|---|---|---|
exclude_hidden | boolean | false | Drop hidden sheets, rows, columns, and styling from the output (gates the four fields below). |
exclude_hidden_sheets | boolean | true | When excluding hidden content, also drop hidden sheets. |
exclude_hidden_rows | boolean | true | When excluding hidden content, also drop hidden rows. |
exclude_hidden_cols | boolean | true | When excluding hidden content, also drop hidden columns. |
exclude_images | boolean | false | When excluding hidden content, also drop embedded/pasted images. |
Tables
| Parameter | Type | Default | What it does |
|---|---|---|---|
split_large_tables | boolean | true | Break big tables into smaller segments so each stays a manageable size. |
max_rows_per_segment | integer | 50 | Maximum rows per segment when split_large_tables is enabled. |
table_clustering | string | "accurate" | How aggressively to group adjacent ranges into tables: accurate (full analysis, best but slower), fast, or off. |
The
exclude_hidden_* and exclude_images toggles only take effect when exclude_hidden is true. Leave exclude_hidden off to keep everything.What an Excel Parse Returns
The response is the same job/chunk/segment shape as a PDF parse (see Response Format), with workbook content surfaced asTable segments. Each sheet’s tables come back as markdown and html, and large tables are split according to split_large_tables / max_rows_per_segment.
The Excel-specific addition is cell_references on each segment:
cell_references: spreadsheet cell-range references for the segment, each an object of{ sheet, address, ref }— the sheet name, the cell or range address (e.g.Sheet1!B2:D10), and the referenced value. This is how you trace a parsed table back to exact cells in the workbook.
Dig Deeper
Response Format
The canonical job/chunk/segment shape shared with PDF parses.
Presigned URLs
Parse workbooks straight from cloud storage with a
url instead of an upload.
