Skip to main content
Spreadsheets don’t behave like documents. There are no pages to lay out, no reading order to reconstruct, and no OCR to run — a workbook is a grid of typed cells spread across one or more sheets, often with hidden rows, formulas, merged ranges, and styling that carry meaning. Excel parsing has its own pipeline and its own endpoint: POST /parse/excel.
Excel files go to POST /parse/excel, not POST /parse. Submitting a .xls or .xlsx to /parse returns a 400 with error: "excel_not_supported_here" pointing you here. Conversely, /parse/excel only accepts Excel files and rejects everything else.

How It Differs from /parse

The Excel endpoint shares the same asynchronous submit-and-poll flow and the same result shape as /parse — you get a job_id, poll GET /parse/{job_id} until Succeeded, and read back chunks of segments. But none of the PDF parsing options carry over: ocr_engine, layout_analysis, segment_processing, agentic_ocr, and the processing modes have no effect on a workbook. Excel parsing has its own, separate set of configuration options (below). Each sheet is converted to structured tables. The parser preserves the grid, lets you drop hidden content, controls how large tables are split, and tags every table segment with the spreadsheet cells it came from.

Submitting a Workbook

Provide either a file upload or a url to a workbook in cloud storage — not both. Every configuration field is optional and defaults to the pipeline’s own default.
import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

with open("workbook.xlsx", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/parse/excel",
        headers={"api-key": API_KEY},
        files={"file": ("workbook.xlsx", f,
                        "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")},
        data={
            "exclude_hidden": "true",    # drop hidden sheets/rows/cols
            "split_large_tables": "true",
            "max_rows_per_segment": "50",
        },
    )
response.raise_for_status()
job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

while True:
    result = requests.get(
        f"{BASE_URL}/parse/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    if result["status"] == "Succeeded":
        break
    if result["status"] == "Failed":
        raise RuntimeError(result.get("message", "parse job failed"))
    time.sleep(5)

with open("result.json", "w") as f:
    json.dump(result, f, indent=2)

Configuration

All fields are optional. Booleans are sent as the strings "true" / "false" in the multipart form.

Hidden Content

ParameterTypeDefaultWhat it does
exclude_hiddenbooleanfalseDrop hidden sheets, rows, columns, and styling from the output (gates the four fields below).
exclude_hidden_sheetsbooleantrueWhen excluding hidden content, also drop hidden sheets.
exclude_hidden_rowsbooleantrueWhen excluding hidden content, also drop hidden rows.
exclude_hidden_colsbooleantrueWhen excluding hidden content, also drop hidden columns.
exclude_imagesbooleanfalseWhen excluding hidden content, also drop embedded/pasted images.

Tables

ParameterTypeDefaultWhat it does
split_large_tablesbooleantrueBreak big tables into smaller segments so each stays a manageable size.
max_rows_per_segmentinteger50Maximum rows per segment when split_large_tables is enabled.
table_clusteringstring"accurate"How aggressively to group adjacent ranges into tables: accurate (full analysis, best but slower), fast, or off.
The exclude_hidden_* and exclude_images toggles only take effect when exclude_hidden is true. Leave exclude_hidden off to keep everything.

What an Excel Parse Returns

The response is the same job/chunk/segment shape as a PDF parse (see Response Format), with workbook content surfaced as Table segments. Each sheet’s tables come back as markdown and html, and large tables are split according to split_large_tables / max_rows_per_segment. The Excel-specific addition is cell_references on each segment:
  • cell_references: spreadsheet cell-range references for the segment, each an object of { sheet, address, ref } — the sheet name, the cell or range address (e.g. Sheet1!B2:D10), and the referenced value. This is how you trace a parsed table back to exact cells in the workbook.
cell_references is distinct from references, which carries research-paper citations for document parses. On Excel segments, spreadsheet cell ranges live in cell_references; references stays null.

Dig Deeper

Response Format

The canonical job/chunk/segment shape shared with PDF parses.

Presigned URLs

Parse workbooks straight from cloud storage with a url instead of an upload.