Batch Cite

Overview

The Batch Cite endpoint processes multiple PDF files simultaneously, extracting structured data according to a provided JSON schema. This endpoint is designed for high-throughput document processing with citations and structured output.

This endpoint processes files in batches and returns a batch job ID for tracking progress. Use the batch status endpoint to monitor processing and retrieve results.

Request

pdf_files

file[]

required

Array of PDF files to process. Each file should be a valid PDF document.

schema_data

string

required

JSON schema defining the structure of data to extract from the PDFs. Must be a valid JSON string.

batch_size

integer

default:"10"

Number of PDFs to process in each batch. Must be a positive integer.

api-key

string

required

API key for authentication

Response

batch_id

string

Unique identifier for the batch processing job

status

string

Initial batch job status (typically “QUEUED”)

total_files

number

Total number of files submitted for processing

message

string

Status message about the batch job creation

Request Examples

curl -X POST "https://visionapi.unsiloed.ai/batch/cite" \
  -H "accept: application/json" \
  -H "api-key: your-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "pdf_files=@document1.pdf" \
  -F "pdf_files=@document2.pdf" \
  -F "pdf_files=@document3.pdf" \
  -F 'schema_data={"type":"object","properties":{"company_name":{"type":"string","description":"Name of the company"},"board_members":{"type":"array","items":{"type":"string"},"description":"List of board members"}},"required":["company_name"],"additionalProperties":false}' \
  -F "batch_size=10"

Response Examples

{
  "batch_id": "batch_f7e8d9c2-4a5b-6c7d-8e9f-0a1b2c3d4e5f",
  "status": "QUEUED",
  "total_files": 3,
  "message": "Batch job created successfully. Processing will begin shortly."
}

Schema Format

The schema_data parameter must be a valid JSON schema that defines the structure of data to extract from your PDFs.

Basic Schema Example

{
  "type": "object",
  "properties": {
    "company_name": {
      "type": "string",
      "description": "Name of the company from the document"
    },
    "date": {
      "type": "string",
      "description": "Document date in YYYY-MM-DD format"
    },
    "amount": {
      "type": "number",
      "description": "Total amount or value mentioned"
    }
  },
  "required": ["company_name"],
  "additionalProperties": false
}

Complex Schema Example

{
  "type": "object",
  "properties": {
    "board_of_directors": {
      "type": "array",
      "description": "List of board members",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "description": "Full name of board member"
          },
          "title": {
            "type": "string",
            "description": "Title or position"
          }
        },
        "required": ["name"],
        "additionalProperties": false
      }
    },
    "financial_data": {
      "type": "object",
      "properties": {
        "revenue": {
          "type": "number",
          "description": "Annual revenue"
        },
        "profit": {
          "type": "number",
          "description": "Net profit"
        }
      },
      "additionalProperties": false
    }
  },
  "required": ["board_of_directors"],
  "additionalProperties": false
}

Monitoring Batch Progress

After creating a batch job, use the batch status endpoint to monitor progress:

import requests
import time

def monitor_batch_progress(batch_id, api_key):
    """Monitor batch processing progress"""
    
    headers = {"api-key": api_key}
    status_url = f"https://visionapi.unsiloed.ai/batch/status/{batch_id}"
    
    while True:
        response = requests.get(status_url, headers=headers)
        
        if response.status_code == 200:
            status_data = response.json()
            
            print(f"Batch Status: {status_data['status']}")
            
            if 'statistics' in status_data:
                stats = status_data['statistics']
                print(f"Progress: {stats['completed_jobs']}/{stats['total_jobs']} completed")
                print(f"Failed: {stats['failed_jobs']}")
                print(f"Processing: {stats['processing_jobs']}")
            
            if status_data['status'] in ['COMPLETED', 'COMPLETED_WITH_FAILURES']:
                print("Batch processing finished!")
                return status_data
                
            elif status_data['status'] == 'FAILED':
                print("Batch processing failed!")
                return status_data
                
        else:
            print(f"Error checking status: {response.status_code}")
            
        time.sleep(10)  # Check every 10 seconds

# Usage
batch_id = "your-batch-id"
final_status = monitor_batch_progress(batch_id, "your-api-key")

Error Handling

Common Error Scenarios

Invalid File Format: Only PDF files are supported
Invalid Schema: Schema must be valid JSON with proper structure
Invalid Batch Size: Must be a positive integer
Authentication Error: Invalid or missing API key
File Size Limits: Individual files may have size restrictions

Best Practices

Validate Schema: Test your JSON schema with a small batch first
File Quality: Use high-quality, text-based PDFs for better extraction
Batch Size: Start with smaller batch sizes (5-10 files) for testing
Monitor Progress: Regularly check batch status for large jobs
Error Recovery: Handle partial failures gracefully

Rate Limits

Concurrent Batches: Limited number of active batch jobs per API key
File Limits: Maximum number of files per batch may apply
Processing Time: Large batches may take significant time to complete

Check your API plan for specific limits and quotas.

Core Endpoints

Job Management

Overview

Request

Response

Request Examples

Response Examples

Schema Format

Basic Schema Example

Complex Schema Example

Monitoring Batch Progress

Error Handling

Common Error Scenarios

Best Practices

Rate Limits

Core Endpoints

Job Management

​Overview

​Request

​Response

​Request Examples

​Response Examples

​Schema Format

​Basic Schema Example

​Complex Schema Example

​Monitoring Batch Progress

​Error Handling

​Common Error Scenarios

​Best Practices

​Rate Limits

Overview

Request

Response

Request Examples

Response Examples

Schema Format

Basic Schema Example

Complex Schema Example

Monitoring Batch Progress

Error Handling

Common Error Scenarios

Best Practices

Rate Limits