Excel Parsing - Unsiloed AI

Spreadsheets don’t behave like documents. There are no pages to lay out, no reading order to reconstruct, and no OCR to run — a workbook is a grid of typed cells spread across one or more sheets, often with hidden rows, formulas, merged ranges, and styling that carry meaning. Excel parsing has its own pipeline and its own endpoint: POST /parse/excel.

Excel files go to POST /parse/excel, not POST /parse. Submitting a .xls or .xlsx to /parse returns a 400 with error: "excel_not_supported_here" pointing you here. Conversely, /parse/excel only accepts Excel files and rejects everything else.

How It Differs from `/parse`

The Excel endpoint shares the same asynchronous submit-and-poll flow and the same result shape as /parse — you get a job_id, poll GET /parse/{job_id} until Succeeded, and read back chunks of segments. But none of the PDF parsing options carry over: ocr_engine, layout_analysis, segment_processing, agentic_ocr, and the processing modes have no effect on a workbook. Excel parsing has its own, separate set of configuration options (below). Each sheet is converted to structured tables. The parser preserves the grid, lets you drop hidden content, controls how large tables are split, and tags every table segment with the spreadsheet cells it came from.

Submitting a Workbook

Provide either a file upload or a url to a workbook in cloud storage — not both. Every configuration field is optional and defaults to the pipeline’s own default.

import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

with open("workbook.xlsx", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/parse/excel",
        headers={"api-key": API_KEY},
        files={"file": ("workbook.xlsx", f,
                        "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")},
        data={
            "exclude_hidden": "true",    # drop hidden sheets/rows/cols
            "split_large_tables": "true",
            "max_rows_per_segment": "50",
        },
    )
response.raise_for_status()
job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

while True:
    result = requests.get(
        f"{BASE_URL}/parse/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    if result["status"] == "Succeeded":
        break
    if result["status"] == "Failed":
        raise RuntimeError(result.get("message", "parse job failed"))
    time.sleep(5)

with open("result.json", "w") as f:
    json.dump(result, f, indent=2)

Configuration

All fields are optional. Booleans are sent as the strings "true" / "false" in the multipart form.

Hidden Content

Parameter	Type	Default	What it does
`exclude_hidden`	boolean	`false`	Drop hidden sheets, rows, columns, and styling from the output (gates the four fields below).
`exclude_hidden_sheets`	boolean	`true`	When excluding hidden content, also drop hidden sheets.
`exclude_hidden_rows`	boolean	`true`	When excluding hidden content, also drop hidden rows.
`exclude_hidden_cols`	boolean	`true`	When excluding hidden content, also drop hidden columns.
`exclude_images`	boolean	`false`	When excluding hidden content, also drop embedded/pasted images.

Tables

Parameter	Type	Default	What it does
`split_large_tables`	boolean	`true`	Break big tables into smaller segments so each stays a manageable size.
`max_rows_per_segment`	integer	`50`	Maximum rows per segment when `split_large_tables` is enabled.
`table_clustering`	string	`"accurate"`	How aggressively to group adjacent ranges into tables: `accurate` (full analysis, best but slower), `fast`, or `off`.

The exclude_hidden_* and exclude_images toggles only take effect when exclude_hidden is true. Leave exclude_hidden off to keep everything.

What an Excel Parse Returns

The response is the same job/chunk/segment shape as a PDF parse (see Response Format), with workbook content surfaced as Table segments. Each sheet’s tables come back as markdown and html, and large tables are split according to split_large_tables / max_rows_per_segment. The Excel-specific addition is cell_references on each segment:

cell_references: spreadsheet cell-range references for the segment, each an object of { sheet, address, ref } — the sheet name, the cell or range address (e.g. Sheet1!B2:D10), and the referenced value. This is how you trace a parsed table back to exact cells in the workbook.

cell_references is distinct from references, which carries research-paper citations for document parses. On Excel segments, spreadsheet cell ranges live in cell_references; references stays null.

Dig Deeper

Response Format

The canonical job/chunk/segment shape shared with PDF parses.

Presigned URLs

Parse workbooks straight from cloud storage with a url instead of an upload.

​How It Differs from /parse

​Submitting a Workbook

​Configuration

​Hidden Content

​Tables

​What an Excel Parse Returns

​Dig Deeper

Response Format

Presigned URLs

How It Differs from `/parse`

Submitting a Workbook

Configuration

Hidden Content

Tables

What an Excel Parse Returns

Dig Deeper