Parse Excel - Unsiloed AI

curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
  -F 'cell_metadata=true' \
  -F 'formulas=true' \
  -F 'split_large_tables=true' \
  -F 'max_rows_per_segment=100' \
  -F 'table_clustering=accurate'

# Alternative: presigned / public URL instead of file upload
# -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'

{
  "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
  "status": "Starting",
  "file_name": "workbook.xlsx",
  "created_at": "2026-06-17T14:22:08.901234Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 3,
  "quota_remaining": 23692,
  "merge_tables": false
}

POST

parse

excel

curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
  -F 'cell_metadata=true' \
  -F 'formulas=true' \
  -F 'split_large_tables=true' \
  -F 'max_rows_per_segment=100' \
  -F 'table_clustering=accurate'

# Alternative: presigned / public URL instead of file upload
# -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'

{
  "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
  "status": "Starting",
  "file_name": "workbook.xlsx",
  "created_at": "2026-06-17T14:22:08.901234Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 3,
  "quota_remaining": 23692,
  "merge_tables": false
}

Overview

The Parse Excel endpoint processes Excel workbooks (.xls, .xlsx) using a dedicated spreadsheet pipeline. It shares auth, billing, quota, and rate-limit infrastructure with Parse Document, and is polled via the same GET /parse/{job_id}.

POST to /parse/excel with your file (or url) and any spreadsheet-specific configuration.
The job is automatically enqueued for processing.
Poll GET /parse/{job_id} to track progress and retrieve results.

PDF parsing options (ocr_strategy, layout_analysis, segment_processing, etc.) do not apply here — they are silently ignored. Non-Excel uploads are rejected with 400; submit those to POST /parse instead.

Request

Provide either file (multipart binary upload) or url (presigned/public URL). The file field is multipart-only; JSON callers must use url.

file

Excel workbook to process. Supported formats: .xls, .xlsx. Required if url is not provided.

url

string

Presigned or public URL of the workbook to fetch and process. Required if file is not provided.

Cell metadata

cell_metadata

boolean

Include per-cell color, formula, and dropdown metadata in the output. Defaults to false. The three sub-toggles below are only honored when this is true.

cell_colors

boolean

Include cell background and font colors. Only effective when cell_metadata is true. Defaults to true.

formulas

boolean

Include the underlying cell formulas alongside their computed values. Only effective when cell_metadata is true. Defaults to true.

dropdowns

boolean

Include data-validation dropdown options for each cell. Only effective when cell_metadata is true. Defaults to true.

Hidden content

exclude_hidden

boolean

Drop hidden content from the output. Defaults to false. The five sub-toggles below are only honored when this is true.

exclude_hidden_sheets

boolean

When excluding hidden content, also drop entire hidden sheets. Defaults to true.

exclude_hidden_rows

boolean

When excluding hidden content, also drop hidden rows from visible sheets. Defaults to true.

exclude_hidden_cols

boolean

When excluding hidden content, also drop hidden columns from visible sheets. Defaults to true.

exclude_styling

boolean

When excluding hidden content, also drop styling. Defaults to true.

exclude_images

boolean

When excluding hidden content, also drop embedded and pasted images. Defaults to false.

Table extraction

split_large_tables

boolean

Split large tables into smaller segments to keep individual response items manageable. Defaults to true.

max_rows_per_segment

integer

Maximum number of rows in each split segment. Only effective when split_large_tables is true. Defaults to 50.

table_clustering

string

How aggressively to detect distinct logical tables on the same sheet.

"accurate" (default): Best fidelity at the cost of latency.
"fast": Quicker clustering, may merge nearby tables.
"off": Treat each sheet as a single table.

Lifecycle

expires_in

integer

Reserved field. Persisted on the task configuration but currently has no effect on retention — Excel tasks are not auto-deleted. To get a presigned-upload TTL for PDFs and other documents, use POST /v2/parse/upload instead.

Response

The endpoint returns HTTP 200 with the same envelope as POST /parse:

job_id

string

required

Job identifier. Pass this to GET /parse/{job_id} to poll for results.

status

string

required

Initial job status. Always "Starting" on creation.

file_name

string

required

Name of the uploaded workbook. For URL submissions this is the last path segment of the URL, or "unknown" when no usable segment exists.

created_at

string

required

ISO 8601 timestamp when the job was created.

message

string

required

Human-readable status message with a polling hint.

credit_used

integer

required

Number of credits deducted from your quota for this job.

quota_remaining

integer

required

Remaining quota after this job was deducted.

merge_tables

boolean

required

Reflects the table-merging flag stored on the job. Always false for Excel jobs — table merging is a PDF-only feature.

curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
  -F 'cell_metadata=true' \
  -F 'formulas=true' \
  -F 'split_large_tables=true' \
  -F 'max_rows_per_segment=100' \
  -F 'table_clustering=accurate'

# Alternative: presigned / public URL instead of file upload
# -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'

{
  "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
  "status": "Starting",
  "file_name": "workbook.xlsx",
  "created_at": "2026-06-17T14:22:08.901234Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 3,
  "quota_remaining": 23692,
  "merge_tables": false
}

Retrieving Results

Use GET /parse/{job_id} (the shared polling endpoint) to check status and retrieve results. The result envelope is the same as for PDF jobs — chunks containing segments — and Excel segments include a cell_references field linking each segment back to its source sheet, address, and range.

cURL

curl -X GET "https://prod.visionapi.unsiloed.ai/parse/{job_id}" \
  -H "accept: application/json" \
  -H "api-key: your-api-key"

Python

import time, requests

def get_excel_results(job_id, api_key):
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"

    while True:
        response = requests.get(status_url, headers=headers)
        response.raise_for_status()
        job = response.json()
        print(f"Status: {job['status']}")

        if job["status"] == "Succeeded":
            return job
        if job["status"] == "Failed":
            raise RuntimeError(f"Job failed: {job.get('message')}")

        time.sleep(5)

See Get Parse Job Status for the full response schema and query parameters.

Error Handling

Status	Cause	Action
`400`	Missing `file`/`url`, non-Excel file type, or malformed parameters	Check the file extension and required fields
`401`	Missing or invalid `api-key`	Check your API key
`402`	Insufficient quota	Add credits to your account or renew your plan
`403`	Access has been revoked	Contact support
`429`	Rate limit (default 10 req/s) or billing usage cap hit	Back off and retry after the `Retry-After` header value
`500`	Internal server error	Retry with exponential backoff
`503`	Job queue at capacity	Retry after the duration indicated in the `Retry-After` header

Authorizations

Authorization

string

header

required

API key for authentication. Use 'Bearer <your_api_key>'

Body

Provide either file (binary .xls/.xlsx upload, multipart only) or url (presigned/public URL, both content types), not both. The file field is multipart-only. Excel parsing uses its own config — PDF parsing options do not apply.

Request body for POST /parse/excel (multipart/form-data).

Excel parsing has its own configuration — none of the PDF parsing options (OCR, layout analysis, segment processing) apply. Provide either file (binary .xls/.xlsx upload) or url, not both. Every config field is optional and defaults to the Excel pipeline's own default.

file

required

Excel workbook to process. Required if url is not provided. Supported: XLS, XLSX.

cell_colors

boolean | null

default:true

Include cell colors (only when cell_metadata). Defaults to true.

cell_metadata

boolean | null

default:false

Include cell color, formula, and dropdown metadata in the output. Defaults to false.

dropdowns

boolean | null

default:true

Include data-validation dropdown options (only when cell_metadata). Defaults to true.

exclude_hidden

boolean | null

default:false

Drop hidden sheets/rows/cols/styling from the output. Defaults to false.

exclude_hidden_cols

boolean | null

default:true

When excluding hidden content, also drop hidden columns. Defaults to true.

exclude_hidden_rows

boolean | null

default:true

When excluding hidden content, also drop hidden rows. Defaults to true.

exclude_hidden_sheets

boolean | null

default:true

When excluding hidden content, also drop hidden sheets. Defaults to true.

exclude_images

boolean | null

default:false

When excluding hidden content, also drop embedded/pasted images. Defaults to false.

exclude_styling

boolean | null

default:true

When excluding hidden content, also drop styling. Defaults to true.

expires_in

integer<int32> | null

Reserved field. Persisted in the task configuration but currently has no effect on retention — Excel tasks use the same Task::new_fast creation path as POST /parse, which does not set the task's expires_at column. See the same note on ParseCreateRequest.expires_in.

formulas

boolean | null

default:true

Include cell formulas (only when cell_metadata). Defaults to true.

max_rows_per_segment

integer<int32> | null

default:50

Max rows per split segment when split_large_tables. Defaults to 50.

split_large_tables

boolean | null

default:true

Split large tables into smaller segments. Defaults to true.

table_clustering

string | null

default:accurate

Table clustering effort: accurate (default), fast, or off.

url

string | null

Presigned or public URL of the workbook to fetch. Required if file is not provided.

Response

Job created — poll with GET /parse/{job_id} to retrieve results.

Response body for a successful POST /parse call.

created_at

string

required

ISO 8601 timestamp when the job was created.

credit_used

integer<int32>

required

Number of pages deducted from your quota for this job.

file_name

string

required

Name of the uploaded file or "unknown" when a URL was provided.

job_id

string

required

Job identifier — pass this to GET /parse/{job_id} to poll for results.

merge_tables

boolean

required

Whether table merging is enabled for this job (reflects the submitted merge_tables value).

message

string

required

Human-readable status message with a polling hint.

quota_remaining

integer<int64>

required

Remaining page quota after this job was deducted.

status

string

required

Initial job status. Always "Starting" on creation.

Parse Document

Get Parse Result

​Overview

​Request

​Cell metadata

​Hidden content

​Table extraction

​Lifecycle

​Response

​Retrieving Results

​Error Handling

Authorizations

Body

Response

Overview

Request

Cell metadata

Hidden content

Table extraction

Lifecycle

Response

Retrieving Results

Error Handling