POST
/
splitter
/
split-pdf
curl -X POST "https://visionapi.unsiloed.ai/splitter/split-pdf?classes=invoice,contract,report" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@mixed_documents.pdf" \
  -F 'categories={"invoice":"Business invoices with itemized charges","contract":"Legal agreements and binding documents"}' \
  --output split_documents.zip
HTTP/1.1 200 OK
Content-Type: application/zip
Content-Disposition: attachment; filename=classified_pdfs.zip
Content-Length: 2048576
X-Classifications: {"1": "invoice", "2": "invoice", "3": "contract", "4": "report"}
X-Confidence-Scores: {"1": 0.95, "2": 0.87, "3": 0.92, "4": 0.78}

[ZIP file containing invoice.pdf, contract.pdf, and report.pdf]

Overview

The Split Document endpoint analyzes PDF pages, classifies them into predefined categories, and creates separate PDF files for each category. This is ideal for processing mixed document batches like scanned files containing invoices, contracts, and reports.

The endpoint returns a ZIP file containing the split documents, with classification and confidence score data provided in response headers.

Request

file
file
required

The PDF file to split. Maximum file size: 100MB

classes
string
required

Comma-separated list of classification categories (e.g., “invoice,contract,report”)

categories
string

JSON string with detailed category descriptions for better classification accuracy

Response

The endpoint returns a ZIP file containing the split PDF documents, with additional metadata in response headers.

Response Headers

Content-Type
string

application/zip

Content-Disposition
string

attachment; filename=classified_pdfs.zip

X-Classifications
string

JSON string containing page-to-category mappings

X-Confidence-Scores
string

JSON string containing confidence scores for each page classification

ZIP Contents

The ZIP file contains separate PDF files for each category found in the document:

  • invoice.pdf - Pages classified as invoices
  • contract.pdf - Pages classified as contracts
  • report.pdf - Pages classified as reports
  • etc.

Request Examples

curl -X POST "https://visionapi.unsiloed.ai/splitter/split-pdf?classes=invoice,contract,report" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@mixed_documents.pdf" \
  -F 'categories={"invoice":"Business invoices with itemized charges","contract":"Legal agreements and binding documents"}' \
  --output split_documents.zip

Response Examples

HTTP/1.1 200 OK
Content-Type: application/zip
Content-Disposition: attachment; filename=classified_pdfs.zip
Content-Length: 2048576
X-Classifications: {"1": "invoice", "2": "invoice", "3": "contract", "4": "report"}
X-Confidence-Scores: {"1": 0.95, "2": 0.87, "3": 0.92, "4": 0.78}

[ZIP file containing invoice.pdf, contract.pdf, and report.pdf]

Best Practices

Category Descriptions: Always provide detailed category descriptions. This can improve classification accuracy by 20-30%.

File Quality: Ensure PDFs contain readable text. Scanned documents should be OCR-processed first for better results.

Category Selection: Use 3-7 categories for optimal accuracy. Too many categories can reduce precision.

File Size: Large files (>50 pages) may timeout. Consider pre-processing very large documents.

Text Quality: The service relies on text extraction. Poor quality scans or image-only PDFs may not classify accurately.

Supported Document Types

The splitting system works best with:

  • Business Documents: Invoices, receipts, purchase orders, contracts
  • Financial Documents: Bank statements, financial reports, tax forms
  • Legal Documents: Contracts, agreements, legal notices, compliance forms
  • Healthcare Documents: Medical records, insurance forms, lab reports
  • HR Documents: Resumes, employment forms, payroll documents
  • Academic Documents: Research papers, reports, transcripts

Classification accuracy varies by document type and quality. Documents with distinct visual layouts and clear textual content typically achieve 85-95% accuracy.

Getting Started

Ready to start splitting your documents? Check out our quickstart guide or try the API in our playground.