> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Split Document

> Split PDF documents by classifying pages into different categories

## Overview

The Split Document endpoint analyzes PDF pages, classifies them into predefined categories, and creates separate PDF files for each category. This is ideal for processing mixed document batches like scanned files containing invoices, contracts, and reports.

<Note>
  The endpoint processes documents asynchronously via a job-based system. It returns a job\_id immediately and processes the document in the background. Poll the status endpoint to retrieve results when complete.
</Note>

## Request

<ParamField body="file" type="file">
  The PDF file to split. Either file or file\_url must be provided; sending both returns a 400.
</ParamField>

<ParamField body="file_url" type="string">
  URL to a PDF file to split. Either file or file\_url must be provided.
</ParamField>

<ParamField body="categories" type="string" required>
  JSON string containing array of category objects with name and optional description (e.g., `[{"name":"invoice","description":"Financial invoices"}]`). Descriptions help the classifier disambiguate similar categories. Categories that match no pages are skipped; no file is created for them.
</ParamField>

<ParamField body="enable_reordering" type="boolean" default="false">
  Reorder pages within each category after classification, using content and page numbers to infer the logical order. Only applied to categories that match more than one page.
</ParamField>

## Response

The endpoint returns HTTP 200 with the job identifier:

<ResponseField name="job_id" type="string">
  Unique identifier for the splitting job
</ResponseField>

<ResponseField name="status" type="string">
  Current status of the job ("processing")
</ResponseField>

<ResponseField name="quota_remaining" type="number">
  Remaining API quota after this request
</ResponseField>

## Split Result

When the job completes, `GET /splitter/{job_id}` returns the split files inside its `result` object. The fields below describe that `result` object:

<ResponseField name="success" type="boolean">
  Whether the splitting operation succeeded
</ResponseField>

<ResponseField name="message" type="string">
  Descriptive message about the splitting operation
</ResponseField>

<ResponseField name="files" type="array">
  Array of split PDF files with their metadata

  <Expandable title="file_structure">
    <ResponseField name="name" type="string">
      Category-derived filename of the split PDF (e.g., "invoice.pdf" for the "invoice" category)
    </ResponseField>

    <ResponseField name="fileId" type="string">
      Unique identifier for the file in storage
    </ResponseField>

    <ResponseField name="type" type="string">
      File type (always "file")
    </ResponseField>

    <ResponseField name="path" type="string">
      Relative path to the file in storage
    </ResponseField>

    <ResponseField name="full_path" type="string">
      Presigned download URL for the split PDF file. Expires roughly an hour after the response is generated, so download the file straight away rather than storing the URL. Re-issue `GET /splitter/{job_id}` to get a fresh URL.
    </ResponseField>

    <ResponseField name="confidence_score" type="number">
      Classification confidence for this file (0-1), averaged across all pages assigned to the category
    </ResponseField>
  </Expandable>
</ResponseField>

## Request Examples

<RequestExample>
  ```bash cURL theme={null}
  curl -X POST "https://prod.visionapi.unsiloed.ai/splitter" \
    -H "api-key: your-api-key" \
    -F "file=@mixed_documents.pdf" \
    -F 'categories=[{"name":"invoice","description":"Business invoices with itemized charges"},{"name":"contract","description":"Legal agreements and binding documents"}]'
  ```

  ```python Python theme={null}
  import requests
  import json

  url = "https://prod.visionapi.unsiloed.ai/splitter"

  # Define categories with descriptions for better accuracy
  categories = [
      {"name": "invoice", "description": "Business invoices with itemized charges and payment terms"},
      {"name": "contract", "description": "Legal agreements, terms of service, and binding documents"}, 
      {"name": "report", "description": "Analytical reports, summaries, and data presentations"},
      {"name": "letter", "description": "Correspondence, memos, and communication documents"}
  ]

  files = {"file": ("mixed_documents.pdf", open("mixed_documents.pdf", "rb"), "application/pdf")}
  data = {"categories": json.dumps(categories)}
  headers = {"api-key": "your-api-key"}

  response = requests.post(
      url, 
      files=files, 
      data=data,
      headers=headers
  )

  if response.status_code == 200:
      result = response.json()
      print(f"Job ID: {result['job_id']}")
      print(f"Status: {result['status']}")
      print(f"Quota Remaining: {result['quota_remaining']}")
  else:
      print("Error:", response.status_code, response.text)

  # Close file
  files["file"][1].close()
  ```

  ```javascript JavaScript theme={null}
  const formData = new FormData();
  formData.append('file', fileInput.files[0]);

  const categories = [
    {name: "invoice", description: "Business invoices with itemized charges"},
    {name: "contract", description: "Legal agreements and binding documents"},
    {name: "report", description: "Analytical reports and data presentations"}
  ];
  formData.append('categories', JSON.stringify(categories));

  const response = await fetch('https://prod.visionapi.unsiloed.ai/splitter', {
    method: 'POST',
    headers: {
      'api-key': 'your-api-key'
    },
    body: formData
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Job ID:', result.job_id);
    console.log('Status:', result.status);
    console.log('Quota Remaining:', result.quota_remaining);
  } else {
    console.error('Split failed:', response.status, await response.text());
  }
  ```
</RequestExample>

## Response Examples

<ResponseExample>
  ```json Split Result (from GET /splitter/{job_id}) theme={null}
  {
    "success": true,
    "message": "Successfully split PDF into 2 files",
    "files": [
      {
        "name": "invoice.pdf",
        "fileId": "d079d09f-201c-4420-a50a-b25678a71ae9",
        "type": "file",
        "path": "invoice.pdf",
        "full_path": "https://example-bucket.s3.amazonaws.com/files/ef3ec356-b407-4f9f-ac8f-0dfdef9034c0_invoice.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
        "confidence_score": 0.8
      },
      {
        "name": "contract.pdf",
        "fileId": "320616cc-8dfd-4b8a-8474-8e7a42d9e287",
        "type": "file",
        "path": "contract.pdf",
        "full_path": "https://example-bucket.s3.amazonaws.com/files/dfaa5d30-6955-4a69-9c69-7e3c4efd8450_contract.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
        "confidence_score": 0.8
      }
    ]
  }
  ```

  ```json Error Response - Invalid File theme={null}
  {
    "detail": "File must be a PDF"
  }
  ```

  ```json Error Response - Invalid Categories theme={null}
  {
    "detail": "Categories must be a JSON array"
  }
  ```

  ```json Error Response - Service Unavailable theme={null}
  {
    "detail": "Failed to start split job. Please retry."
  }
  ```
</ResponseExample>


## OpenAPI

````yaml api-reference/openapi.json POST /splitter
openapi: 3.1.0
info:
  title: Unsiloed AI Document Processing API
  description: >-
    A comprehensive API for document processing, extraction, and analysis using
    AI-powered tools
  license:
    name: MIT
  version: 1.0.0
servers:
  - url: https://prod.visionapi.unsiloed.ai
    description: Production server
security:
  - apiKeyAuth: []
paths:
  /splitter:
    post:
      summary: Split Document
      description: Split PDF documents by classifying pages into categories
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  description: PDF file to split. Either file or file_url must be provided.
                file_url:
                  type: string
                  description: >-
                    URL to a PDF file to split. Either file or file_url must be
                    provided. Example: https://example.com/mixed_documents.pdf
                categories:
                  type: string
                  description: >-
                    JSON string containing array of category objects with name
                    and optional description. Example:
                    [{"name":"invoice","description":"Business invoices with
                    itemized charges"},{"name":"contract","description":"Legal
                    agreements and binding documents"},{"name":"report"}]
                enable_reordering:
                  anyOf:
                    - type: boolean
                    - type: 'null'
                  title: Enable Reordering
                  description: >-
                    Reorder pages within each category after classification,
                    using content and page numbers to infer the logical order.
                    Only applied to categories that match more than one page.
                  default: false
              required:
                - categories
            encoding:
              file:
                contentType: application/pdf
      responses:
        '200':
          description: Split job created
          content:
            application/json:
              schema:
                type: object
                properties:
                  job_id:
                    type: string
                    description: Unique identifier for the splitting job
                  status:
                    type: string
                    description: Current job status (typically 'processing')
                  quota_remaining:
                    type: number
                    description: Remaining API quota after this request
      security:
        - apiKeyAuth: []
components:
  securitySchemes:
    apiKeyAuth:
      type: apiKey
      in: header
      name: api-key

````