Skip to main content

Overview

The document classification system uses advanced AI models to automatically categorize documents based on their content, structure, and visual elements. It provides accurate classification with confidence scores for single-page and multi-page documents.
Classification jobs are processed asynchronously. Submit a classification job and poll the status endpoint to retrieve results when complete.

Key Features

AI-Powered Analysis

Leverage advanced vision models to understand document content and structure

Multi-Page Support

Analyze entire documents with page-by-page classification and aggregation

Confidence Scoring

Receive detailed confidence scores for classification accuracy

Custom Categories

Define custom classification categories with optional descriptions

API Usage

Classify Document

from unsiloed_sdk import UnsiloedClient

# Define categories with optional descriptions
categories = [
    {"name": "Medical Record", "description": "Patient medical records and history"},
    {"name": "Lab Report", "description": "Laboratory test results"},
    {"name": "Prescription"}  # Description is optional
]

with UnsiloedClient(api_key="your-api-key") as client:
    # Classify and wait for completion
    result = client.classify_and_wait(
        file="document.pdf",
        categories=categories
    )

    print(f"Classification: {result.result['classification']}")
    print(f"Confidence: {result.result['confidence']:.2%}")

    # Check page-by-page results if available
    if 'page_results' in result.result:
        for page_result in result.result['page_results']:
            print(f"Page {page_result['page']}: {page_result['classification']}")

Check Classification Job Status

from unsiloed_sdk import UnsiloedClient

def check_classification_status(job_id: str, api_key: str):
    with UnsiloedClient(api_key=api_key) as client:
        # Get classification result
        job = client.get_classify_result(job_id)

        print(f"Status: {job.status}")
        print(f"Progress: {job.progress}")

        if job.status == "completed" and job.result:
            result = job.result
            print(f"\nClassification: {result['classification']}")
            print(f"Confidence: {result['confidence']:.2%}")
            print(f"Total Pages: {result.get('total_pages', 'N/A')}")

            if 'page_results' in result:
                for page_result in result['page_results']:
                    print(f"Page {page_result['page']}: {page_result['classification']}")
        
        return job

Poll for Completion

from unsiloed_sdk import UnsiloedClient

categories = [
    {"name": "Invoice"},
    {"name": "Receipt"},
    {"name": "Contract"}
]

with UnsiloedClient(api_key="your-api-key") as client:
    # Use classify_and_wait for automatic polling
    result = client.classify_and_wait(
        file="document.pdf",
        categories=categories,
        poll_interval=5,  # Check every 5 seconds
    )

    print(f"Classification: {result.result['classification']}")
    print(f"Confidence: {result.result['confidence']:.2%}")

Response Format

Job Creation Response

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "processing",
  "message": "Classification started",
  "quota_remaining": 400
}

Job Status Response (Completed)

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 0.9977573126693524,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "classification": "Medical Record",
        "confidence": 0.9977573126693524
      }
    ]
  }
}

Job Status Response (Processing)

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "processing",
  "progress": "Analyzing document...",
  "error": null
}

Job Status Response (Failed)

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "failed",
  "progress": "Classification failed",
  "error": "Invalid PDF format"
}

Common Use Cases

  • Healthcare Documents: Classify medical records, lab reports, prescriptions, and insurance forms
  • Business Documents: Categorize invoices, receipts, purchase orders, and contracts
  • Financial Documents: Sort bank statements, tax forms, and financial reports
  • Legal Documents: Identify contracts, agreements, legal notices, and compliance forms

Error Handling

Invalid parameters. Check that categories are properly formatted as a JSON array.
Insufficient quota. Check your account quota and upgrade if needed.
The job ID doesn’t exist or has expired. Verify the job ID is correct.
Server error during processing. Retry the request or contact support.

Best Practices

  1. Provide Descriptions: Add descriptions to categories for better classification accuracy
  2. Monitor Quota: Track quota_remaining in responses to avoid hitting limits
  3. Handle Errors: Implement proper error handling and retry logic
  4. Poll Appropriately: Wait 5-10 seconds between status checks to avoid rate limits
  5. Validate Results: Check confidence scores to determine if manual review is needed