Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt

Use this file to discover all available pages before exploring further.

We’ll walk through splitting a single bundled PDF into separate category-specific files. For classifying a document (not splitting it), see Getting Started With Classification.

Before You Start

  • Get an Unsiloed AI API key by signing up.
  • Have a PDF that contains multiple documents bundled together.
  • Decide on the candidate categories the bundle should split into.
Keep your API key out of source control. The examples below read it from the UNSILOED_API_KEY environment variable.

1. Submit a Bundled PDF With Categories

The /splitter endpoint accepts a multipart upload with two fields: file (the bundled PDF) and categories (a JSON list of category objects, each with a name and an optional description). It returns a job_id you can poll for results.
The JavaScript example uses ES modules (top-level await, import). Save it as script.mjs or add "type": "module" to your package.json. You’ll also need Node.js 18 or newer, which exposes fetch, FormData, and Blob as globals.
import os
import json
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

categories = [
    {"name": "Invoice", "description": "Financial invoices with itemized charges"},
    {"name": "Receipt", "description": "Purchase receipts"},
    {"name": "Contract"},
]

with open("mixed_documents.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/splitter",
        headers={"api-key": API_KEY},
        files={"file": ("mixed_documents.pdf", f, "application/pdf")},
        data={"categories": json.dumps(categories)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

2. Poll for Results

Polling GET /splitter/{job_id} returns the job’s current state. A status of completed indicates the split files are ready, failed indicates the job errored, and any other value (such as processing) means the job is still running.
import time

while True:
    result = requests.get(
        f"{BASE_URL}/splitter/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "split job failed"))
    time.sleep(5)

for file_info in result["result"]["files"]:
    print(f"{file_info['name']}: {file_info['full_path']} (confidence={file_info['confidence_score']:.2%})")
See the Response Format reference for the full response shape.