Skip to main content
POST
/
parse
/
excel
curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
  -F 'cell_metadata=true' \
  -F 'formulas=true' \
  -F 'split_large_tables=true' \
  -F 'max_rows_per_segment=100' \
  -F 'table_clustering=accurate'

# Alternative: presigned / public URL instead of file upload
# -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'
{
  "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
  "status": "Starting",
  "file_name": "workbook.xlsx",
  "created_at": "2026-06-17T14:22:08.901234Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 3,
  "quota_remaining": 23692,
  "merge_tables": false
}

Overview

The Parse Excel endpoint processes Excel workbooks (.xls, .xlsx) using a dedicated spreadsheet pipeline. It shares auth, billing, quota, and rate-limit infrastructure with Parse Document, and is polled via the same GET /parse/{job_id}.
  1. POST to /parse/excel with your file (or url) and any spreadsheet-specific configuration.
  2. The job is automatically enqueued for processing.
  3. Poll GET /parse/{job_id} to track progress and retrieve results.
PDF parsing options (ocr_strategy, layout_analysis, segment_processing, etc.) do not apply here — they are silently ignored. Non-Excel uploads are rejected with 400; submit those to POST /parse instead.

Request

Provide either file (multipart binary upload) or url (presigned/public URL). The file field is multipart-only; JSON callers must use url.
file
file
Excel workbook to process. Supported formats: .xls, .xlsx. Required if url is not provided.
url
string
Presigned or public URL of the workbook to fetch and process. Required if file is not provided.

Cell metadata

cell_metadata
boolean
Include per-cell color, formula, and dropdown metadata in the output. Defaults to false. The three sub-toggles below are only honored when this is true.
cell_colors
boolean
Include cell background and font colors. Only effective when cell_metadata is true. Defaults to true.
formulas
boolean
Include the underlying cell formulas alongside their computed values. Only effective when cell_metadata is true. Defaults to true.
dropdowns
boolean
Include data-validation dropdown options for each cell. Only effective when cell_metadata is true. Defaults to true.

Hidden content

exclude_hidden
boolean
Drop hidden content from the output. Defaults to false. The five sub-toggles below are only honored when this is true.
exclude_hidden_sheets
boolean
When excluding hidden content, also drop entire hidden sheets. Defaults to true.
exclude_hidden_rows
boolean
When excluding hidden content, also drop hidden rows from visible sheets. Defaults to true.
exclude_hidden_cols
boolean
When excluding hidden content, also drop hidden columns from visible sheets. Defaults to true.
exclude_styling
boolean
When excluding hidden content, also drop styling. Defaults to true.
exclude_images
boolean
When excluding hidden content, also drop embedded and pasted images. Defaults to false.

Table extraction

split_large_tables
boolean
Split large tables into smaller segments to keep individual response items manageable. Defaults to true.
max_rows_per_segment
integer
Maximum number of rows in each split segment. Only effective when split_large_tables is true. Defaults to 50.
table_clustering
string
How aggressively to detect distinct logical tables on the same sheet.
  • "accurate" (default): Best fidelity at the cost of latency.
  • "fast": Quicker clustering, may merge nearby tables.
  • "off": Treat each sheet as a single table.

Lifecycle

expires_in
integer
Reserved field. Persisted on the task configuration but currently has no effect on retention — Excel tasks are not auto-deleted. To get a presigned-upload TTL for PDFs and other documents, use POST /v2/parse/upload instead.

Response

The endpoint returns HTTP 200 with the same envelope as POST /parse:
job_id
string
required
Job identifier. Pass this to GET /parse/{job_id} to poll for results.
status
string
required
Initial job status. Always "Starting" on creation.
file_name
string
required
Name of the uploaded workbook. For URL submissions this is the last path segment of the URL, or "unknown" when no usable segment exists.
created_at
string
required
ISO 8601 timestamp when the job was created.
message
string
required
Human-readable status message with a polling hint.
credit_used
integer
required
Number of credits deducted from your quota for this job.
quota_remaining
integer
required
Remaining quota after this job was deducted.
merge_tables
boolean
required
Reflects the table-merging flag stored on the job. Always false for Excel jobs — table merging is a PDF-only feature.
curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
  -F 'cell_metadata=true' \
  -F 'formulas=true' \
  -F 'split_large_tables=true' \
  -F 'max_rows_per_segment=100' \
  -F 'table_clustering=accurate'

# Alternative: presigned / public URL instead of file upload
# -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'
{
  "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
  "status": "Starting",
  "file_name": "workbook.xlsx",
  "created_at": "2026-06-17T14:22:08.901234Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 3,
  "quota_remaining": 23692,
  "merge_tables": false
}

Retrieving Results

Use GET /parse/{job_id} (the shared polling endpoint) to check status and retrieve results. The result envelope is the same as for PDF jobs — chunks containing segments — and Excel segments include a cell_references field linking each segment back to its source sheet, address, and range.
cURL
curl -X GET "https://prod.visionapi.unsiloed.ai/parse/{job_id}" \
  -H "accept: application/json" \
  -H "api-key: your-api-key"
Python
import time, requests

def get_excel_results(job_id, api_key):
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"

    while True:
        response = requests.get(status_url, headers=headers)
        response.raise_for_status()
        job = response.json()
        print(f"Status: {job['status']}")

        if job["status"] == "Succeeded":
            return job
        if job["status"] == "Failed":
            raise RuntimeError(f"Job failed: {job.get('message')}")

        time.sleep(5)
See Get Parse Job Status for the full response schema and query parameters.

Error Handling

StatusCauseAction
400Missing file/url, non-Excel file type, or malformed parametersCheck the file extension and required fields
401Missing or invalid api-keyCheck your API key
402Insufficient quotaAdd credits to your account or renew your plan
403Access has been revokedContact support
429Rate limit (default 10 req/s) or billing usage cap hitBack off and retry after the Retry-After header value
500Internal server errorRetry with exponential backoff
503Job queue at capacityRetry after the duration indicated in the Retry-After header

Authorizations

Authorization
string
header
required

API key for authentication. Use 'Bearer <your_api_key>'

Body

Provide either file (binary .xls/.xlsx upload, multipart only) or url (presigned/public URL, both content types), not both. The file field is multipart-only. Excel parsing uses its own config — PDF parsing options do not apply.

Request body for POST /parse/excel (multipart/form-data).

Excel parsing has its own configuration — none of the PDF parsing options (OCR, layout analysis, segment processing) apply. Provide either file (binary .xls/.xlsx upload) or url, not both. Every config field is optional and defaults to the Excel pipeline's own default.

file
file
required

Excel workbook to process. Required if url is not provided. Supported: XLS, XLSX.

cell_colors
boolean | null
default:true

Include cell colors (only when cell_metadata). Defaults to true.

cell_metadata
boolean | null
default:false

Include cell color, formula, and dropdown metadata in the output. Defaults to false.

dropdowns
boolean | null
default:true

Include data-validation dropdown options (only when cell_metadata). Defaults to true.

exclude_hidden
boolean | null
default:false

Drop hidden sheets/rows/cols/styling from the output. Defaults to false.

exclude_hidden_cols
boolean | null
default:true

When excluding hidden content, also drop hidden columns. Defaults to true.

exclude_hidden_rows
boolean | null
default:true

When excluding hidden content, also drop hidden rows. Defaults to true.

exclude_hidden_sheets
boolean | null
default:true

When excluding hidden content, also drop hidden sheets. Defaults to true.

exclude_images
boolean | null
default:false

When excluding hidden content, also drop embedded/pasted images. Defaults to false.

exclude_styling
boolean | null
default:true

When excluding hidden content, also drop styling. Defaults to true.

expires_in
integer<int32> | null

Reserved field. Persisted in the task configuration but currently has no effect on retention — Excel tasks use the same Task::new_fast creation path as POST /parse, which does not set the task's expires_at column. See the same note on ParseCreateRequest.expires_in.

formulas
boolean | null
default:true

Include cell formulas (only when cell_metadata). Defaults to true.

max_rows_per_segment
integer<int32> | null
default:50

Max rows per split segment when split_large_tables. Defaults to 50.

split_large_tables
boolean | null
default:true

Split large tables into smaller segments. Defaults to true.

table_clustering
string | null
default:accurate

Table clustering effort: accurate (default), fast, or off.

url
string | null

Presigned or public URL of the workbook to fetch. Required if file is not provided.

Response

Job created — poll with GET /parse/{job_id} to retrieve results.

Response body for a successful POST /parse call.

created_at
string
required

ISO 8601 timestamp when the job was created.

credit_used
integer<int32>
required

Number of pages deducted from your quota for this job.

file_name
string
required

Name of the uploaded file or "unknown" when a URL was provided.

job_id
string
required

Job identifier — pass this to GET /parse/{job_id} to poll for results.

merge_tables
boolean
required

Whether table merging is enabled for this job (reflects the submitted merge_tables value).

message
string
required

Human-readable status message with a polling hint.

quota_remaining
integer<int64>
required

Remaining page quota after this job was deducted.

status
string
required

Initial job status. Always "Starting" on creation.