> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse Excel

> Dedicated ingestion endpoint for Excel workbooks (.xls / .xlsx) with spreadsheet-specific configuration.

## Overview

The Parse Excel endpoint processes Excel workbooks (`.xls`, `.xlsx`) using a dedicated spreadsheet pipeline. It shares auth, billing, quota, and rate-limit infrastructure with [Parse Document](/api-reference/parser/parse-document), and is polled via the same `GET /parse/{job_id}`.

1. **POST** to `/parse/excel` with your file (or `url`) and any spreadsheet-specific configuration.
2. The job is automatically enqueued for processing.
3. **Poll** `GET /parse/{job_id}` to track progress and retrieve results.

<Note>
  PDF parsing options (`ocr_strategy`, `layout_analysis`, `segment_processing`, etc.) do not apply here — they are silently ignored. Non-Excel uploads are rejected with `400`; submit those to [`POST /parse`](/api-reference/parser/parse-document) instead.
</Note>

## Request

<Note>
  Provide either `file` (multipart binary upload) or `url` (presigned/public URL). The `file` field is multipart-only; JSON callers must use `url`.
</Note>

<ParamField body="file" type="file">
  Excel workbook to process. Supported formats: `.xls`, `.xlsx`. Required if `url` is not provided.
</ParamField>

<ParamField body="url" type="string">
  Presigned or public URL of the workbook to fetch and process. Required if `file` is not provided.
</ParamField>

### Hidden content

<ParamField body="exclude_hidden" type="boolean">
  Drop hidden content from the output. Defaults to `false`. The five sub-toggles below are only honored when this is `true`.
</ParamField>

<ParamField body="exclude_hidden_sheets" type="boolean">
  When excluding hidden content, also drop entire hidden sheets. Defaults to `true`.
</ParamField>

<ParamField body="exclude_hidden_rows" type="boolean">
  When excluding hidden content, also drop hidden rows from visible sheets. Defaults to `true`.
</ParamField>

<ParamField body="exclude_hidden_cols" type="boolean">
  When excluding hidden content, also drop hidden columns from visible sheets. Defaults to `true`.
</ParamField>

<ParamField body="exclude_styling" type="boolean">
  When excluding hidden content, also drop styling. Defaults to `true`.
</ParamField>

<ParamField body="exclude_images" type="boolean">
  When excluding hidden content, also drop embedded and pasted images. Defaults to `false`.
</ParamField>

### Table extraction

<ParamField body="split_large_tables" type="boolean">
  Split large tables into smaller segments to keep individual response items manageable. Defaults to `true`.
</ParamField>

<ParamField body="max_rows_per_segment" type="integer">
  Maximum number of rows in each split segment. Only effective when `split_large_tables` is `true`. Defaults to `50`.
</ParamField>

<ParamField body="table_clustering" type="string">
  How aggressively to detect distinct logical tables on the same sheet.

  * `"accurate"` **(default)**: Best fidelity at the cost of latency.
  * `"fast"`: Quicker clustering, may merge nearby tables.
  * `"off"`: Treat each sheet as a single table.
</ParamField>

### Lifecycle

<ParamField body="expires_in" type="integer">
  Reserved field. Persisted on the task configuration but currently has no effect on retention — Excel tasks are not auto-deleted. To get a presigned-upload TTL for PDFs and other documents, use [`POST /v2/parse/upload`](/api-reference/parser/parse-document-v2) instead.
</ParamField>

## Response

The endpoint returns HTTP 200 with the same envelope as [`POST /parse`](/api-reference/parser/parse-document):

<ResponseField name="job_id" type="string" required>
  Job identifier. Pass this to `GET /parse/{job_id}` to poll for results.
</ResponseField>

<ResponseField name="status" type="string" required>
  Initial job status. Always `"Starting"` on creation.
</ResponseField>

<ResponseField name="file_name" type="string" required>
  Name of the uploaded workbook. For URL submissions this is the last path segment of the URL, or `"unknown"` when no usable segment exists.
</ResponseField>

<ResponseField name="created_at" type="string" required>
  ISO 8601 timestamp when the job was created.
</ResponseField>

<ResponseField name="message" type="string" required>
  Human-readable status message with a polling hint.
</ResponseField>

<ResponseField name="credit_used" type="integer" required>
  Number of credits deducted from your quota for this job.
</ResponseField>

<ResponseField name="quota_remaining" type="integer" required>
  Remaining quota after this job was deducted.
</ResponseField>

<ResponseField name="merge_tables" type="boolean" required>
  Reflects the table-merging flag stored on the job. Always `false` for Excel jobs — table merging is a PDF-only feature.
</ResponseField>

<RequestExample>
  ```bash cURL theme={null}
  curl -X POST 'https://prod.visionapi.unsiloed.ai/parse/excel' \
    -H 'accept: application/json' \
    -H 'api-key: your-api-key' \
    -H 'Content-Type: multipart/form-data' \
    -F 'file=@workbook.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
    -F 'exclude_hidden=true' \
    -F 'split_large_tables=true' \
    -F 'max_rows_per_segment=100' \
    -F 'table_clustering=accurate'

  # Alternative: presigned / public URL instead of file upload
  # -F 'url=https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...'
  ```

  ```python Python theme={null}
  import requests

  url = "https://prod.visionapi.unsiloed.ai/parse/excel"
  headers = {"accept": "application/json", "api-key": "your-api-key"}

  files = {"file": ("workbook.xlsx", open("workbook.xlsx", "rb"),
                    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")}

  data = {
      "exclude_hidden": "true",
      "split_large_tables": "true",
      "max_rows_per_segment": "100",
      "table_clustering": "accurate",
  }

  response = requests.post(url, headers=headers, files=files, data=data)

  if response.status_code == 200:
      result = response.json()
      print(f"Job ID: {result['job_id']}")
      print(f"Status: {result['status']}")
      print(f"Credit used: {result['credit_used']}")
  else:
      print("Error:", response.status_code, response.text)

  files["file"][1].close()

  # ========== Alternative: Use Presigned URL ==========
  # data = {
  #     "url": "https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...",
  #     "exclude_hidden": "true",
  # }
  # response = requests.post(url, headers=headers, data=data)
  ```

  ```javascript JavaScript theme={null}
  const formData = new FormData();

  const fileInput = document.querySelector('input[type="file"]');
  if (fileInput.files[0]) {
    formData.append('file', fileInput.files[0]);
  }

  formData.append('exclude_hidden', 'true');
  formData.append('split_large_tables', 'true');
  formData.append('max_rows_per_segment', '100');
  formData.append('table_clustering', 'accurate');

  const response = await fetch('https://prod.visionapi.unsiloed.ai/parse/excel', {
    method: 'POST',
    headers: {
      'accept': 'application/json',
      'api-key': 'your-api-key',
    },
    body: formData,
  });

  if (response.ok) {
    const result = await response.json();
    console.log(`Job ID: ${result.job_id}`);
    console.log(`Status: ${result.status}`);
  } else {
    console.error('Parsing failed:', response.status, await response.text());
  }

  // ========== Alternative: Use Presigned URL ==========
  // formData.append('url', 'https://your-bucket.s3.amazonaws.com/workbook.xlsx?signature=...');
  ```
</RequestExample>

<ResponseExample>
  ```json Success Response theme={null}
  {
    "job_id": "9b1f7a04-2c33-4f8e-9c92-6f8a2e84b3d1",
    "status": "Starting",
    "file_name": "workbook.xlsx",
    "created_at": "2026-06-17T14:22:08.901234Z",
    "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
    "credit_used": 3,
    "quota_remaining": 23692,
    "merge_tables": false
  }
  ```

  ```text 400 - Non-Excel File theme={null}
  Bad request: file is not a supported Excel format (.xls / .xlsx).
  ```

  ```text 400 - Missing Input theme={null}
  Either file or url must be provided
  ```

  ```json 402 - Insufficient Quota theme={null}
  {
    "message": "Insufficient quota",
    "status": "INSUFFICIENT_QUOTA",
    "quota_data": {
      "remaining": 0
    }
  }
  ```

  ```json 429 - Rate Limit Exceeded theme={null}
  {
    "error": "rate_limit_exceeded",
    "message": "Rate limit of 10 requests per second exceeded. Retry after 1s.",
    "retry_after": 1
  }
  ```

  ```text 503 - Service Unavailable theme={null}
  Service unavailable: job queue is at capacity. Retry after the duration indicated in the Retry-After header.
  ```
</ResponseExample>

## Retrieving Results

Use `GET /parse/{job_id}` (the shared polling endpoint) to check status and retrieve results. The result envelope is the same as for PDF jobs — chunks containing segments — and Excel segments include a `cell_references` field linking each segment back to its source sheet, address, and range.

```bash cURL theme={null}
curl -X GET "https://prod.visionapi.unsiloed.ai/parse/{job_id}" \
  -H "accept: application/json" \
  -H "api-key: your-api-key"
```

```python Python theme={null}
import time, requests

def get_excel_results(job_id, api_key):
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"

    while True:
        response = requests.get(status_url, headers=headers)
        response.raise_for_status()
        job = response.json()
        print(f"Status: {job['status']}")

        if job["status"] == "Succeeded":
            return job
        if job["status"] == "Failed":
            raise RuntimeError(f"Job failed: {job.get('message')}")

        time.sleep(5)
```

See [Get Parse Job Status](/api-reference/parser/get-parse-job-status) for the full response schema and query parameters.

## Error Handling

| Status | Cause                                                              | Action                                                         |
| ------ | ------------------------------------------------------------------ | -------------------------------------------------------------- |
| `400`  | Missing `file`/`url`, non-Excel file type, or malformed parameters | Check the file extension and required fields                   |
| `401`  | Missing or invalid `api-key`                                       | Check your API key                                             |
| `402`  | Insufficient quota                                                 | Add credits to your account or renew your plan                 |
| `403`  | Access has been revoked                                            | Contact support                                                |
| `429`  | Rate limit (default 10 req/s) or billing usage cap hit             | Back off and retry after the `Retry-After` header value        |
| `500`  | Internal server error                                              | Retry with exponential backoff                                 |
| `503`  | Job queue at capacity                                              | Retry after the duration indicated in the `Retry-After` header |


## OpenAPI

````yaml api-reference/parser/openapi-v1.json POST /parse/excel
openapi: 3.1.0
info:
  title: Unsiloed Parser API — v1
  description: >-
    The original document parsing API. Accepts multipart file uploads and
    URL-based processing. These endpoints have no version prefix in their URLs
    and are stable indefinitely.
  contact:
    name: Unsiloed
    url: https://unsiloed.ai
    email: hello@unsiloed.com
  license:
    name: ''
  version: 1.0.0
servers:
  - url: https://prod.visionapi.unsiloed.ai
    description: Production
security: []
tags:
  - name: Authentication
    description: API key management endpoints
  - name: Health
    description: Endpoint for checking the health of the service.
  - name: Parse (Vision-API Compatible)
    description: >-
      Vision-API compatible endpoints for parsing - accepts multipart form data
      with Vision-API parameter names
paths:
  /parse/excel:
    post:
      tags:
        - Parse (Vision-API Compatible)
      summary: POST /parse/excel
      description: >-
        Dedicated Excel (.xls/.xlsx) ingestion endpoint. Shares all auth,
        billing, quota,

        and rate-limit infrastructure with `POST /parse`, and is polled via the
        same

        `GET /parse/{job_id}`. Non-Excel uploads are rejected with a 400; submit
        those to

        `POST /parse` instead.


        Accepts two request shapes at the same path, chosen by `Content-Type`:

        - `multipart/form-data` — binary `.xls`/`.xlsx` upload (`file` field) or
        a `url` field.

        - `application/json` (or `application/x-www-form-urlencoded`) — JSON
        body with a
          required `url` field; the `file` field is not applicable.
      operationId: create_parse_task_excel
      requestBody:
        description: >-
          Provide either `file` (binary .xls/.xlsx upload, multipart only) or
          `url` (presigned/public URL, both content types), not both. The `file`
          field is multipart-only. Excel parsing uses its own config — PDF
          parsing options do not apply.
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ExcelParseCreateRequest'
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/ExcelParseCreateRequest'
        required: true
      responses:
        '200':
          description: Job created — poll with GET /parse/{job_id} to retrieve results.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ParseCreateResponse'
        '400':
          description: >-
            Bad request — missing file/url, non-Excel file type, or invalid
            parameters.
          content:
            text/plain:
              schema:
                type: string
        '401':
          description: Unauthorized
        '402':
          description: Insufficient quota — not enough page credits remaining.
          content:
            text/plain:
              schema:
                type: string
        '403':
          description: Forbidden — access has been revoked.
        '429':
          description: Usage limit exceeded (billing cap) or rate limit hit.
          content:
            text/plain:
              schema:
                type: string
        '500':
          description: Internal server error.
          content:
            text/plain:
              schema:
                type: string
        '503':
          description: Service unavailable — job queue is at capacity.
          content:
            text/plain:
              schema:
                type: string
      security:
        - api_key: []
components:
  schemas:
    ExcelParseCreateRequest:
      type: object
      description: >-
        Request body for `POST /parse/excel` (multipart/form-data).


        Excel parsing has its own configuration — none of the PDF parsing
        options

        (OCR, layout analysis, segment processing) apply. Provide either `file`

        (binary .xls/.xlsx upload) or `url`, not both. Every config field is
        optional

        and defaults to the Excel pipeline's own default.
      required:
        - file
      properties:
        exclude_hidden:
          type:
            - boolean
            - 'null'
          description: >-
            Drop hidden sheets/rows/cols/styling from the output. Defaults to
            `false`.
          default: false
        exclude_hidden_cols:
          type:
            - boolean
            - 'null'
          description: >-
            When excluding hidden content, also drop hidden columns. Defaults to
            `true`.
          default: true
        exclude_hidden_rows:
          type:
            - boolean
            - 'null'
          description: >-
            When excluding hidden content, also drop hidden rows. Defaults to
            `true`.
          default: true
        exclude_hidden_sheets:
          type:
            - boolean
            - 'null'
          description: >-
            When excluding hidden content, also drop hidden sheets. Defaults to
            `true`.
          default: true
        exclude_images:
          type:
            - boolean
            - 'null'
          description: >-
            When excluding hidden content, also drop embedded/pasted images.
            Defaults to `false`.
          default: false
        exclude_styling:
          type:
            - boolean
            - 'null'
          description: >-
            When excluding hidden content, also drop styling. Defaults to
            `true`.
          default: true
        expires_in:
          type:
            - integer
            - 'null'
          format: int32
          description: >-
            Reserved field. Persisted in the task configuration but currently
            has no

            effect on retention — Excel tasks use the same `Task::new_fast`
            creation

            path as `POST /parse`, which does not set the task's `expires_at`
            column.

            See the same note on `ParseCreateRequest.expires_in`.
        file:
          type: string
          format: binary
          description: >-
            Excel workbook to process. Required if `url` is not provided.
            Supported: XLS, XLSX.
        max_rows_per_segment:
          type:
            - integer
            - 'null'
          format: int32
          description: >-
            Max rows per split segment when `split_large_tables`. Defaults to
            `50`.
          default: 50
        split_large_tables:
          type:
            - boolean
            - 'null'
          description: Split large tables into smaller segments. Defaults to `true`.
          default: true
        table_clustering:
          type:
            - string
            - 'null'
          description: 'Table clustering effort: `accurate` (default), `fast`, or `off`.'
          default: accurate
        url:
          type:
            - string
            - 'null'
          description: >-
            Presigned or public URL of the workbook to fetch. Required if `file`
            is not provided.
    ParseCreateResponse:
      type: object
      description: Response body for a successful `POST /parse` call.
      required:
        - job_id
        - status
        - file_name
        - created_at
        - message
        - credit_used
        - quota_remaining
        - merge_tables
      properties:
        created_at:
          type: string
          description: ISO 8601 timestamp when the job was created.
        credit_used:
          type: integer
          format: int32
          description: Number of pages deducted from your quota for this job.
        file_name:
          type: string
          description: Name of the uploaded file or `"unknown"` when a URL was provided.
        job_id:
          type: string
          description: >-
            Job identifier — pass this to `GET /parse/{job_id}` to poll for
            results.
        merge_tables:
          type: boolean
          description: >-
            Whether table merging is enabled for this job (reflects the
            submitted `merge_tables` value).
        message:
          type: string
          description: Human-readable status message with a polling hint.
        quota_remaining:
          type: integer
          format: int64
          description: Remaining page quota after this job was deducted.
        status:
          type: string
          description: Initial job status. Always `"Starting"` on creation.
  securitySchemes:
    api_key:
      type: http
      scheme: bearer
      description: API key for authentication. Use 'Bearer <your_api_key>'

````