POST
/
batch
/
cite
curl -X POST "https://visionapi.unsiloed.ai/batch/cite" \
  -H "accept: application/json" \
  -H "api-key: your-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "pdf_files=@document1.pdf" \
  -F "pdf_files=@document2.pdf" \
  -F "pdf_files=@document3.pdf" \
  -F 'schema_data={"type":"object","properties":{"company_name":{"type":"string","description":"Name of the company"},"board_members":{"type":"array","items":{"type":"string"},"description":"List of board members"}},"required":["company_name"],"additionalProperties":false}' \
  -F "batch_size=10"
{
  "batch_id": "batch_f7e8d9c2-4a5b-6c7d-8e9f-0a1b2c3d4e5f",
  "status": "QUEUED",
  "total_files": 3,
  "message": "Batch job created successfully. Processing will begin shortly."
}

Overview

The Batch Cite endpoint processes multiple PDF files simultaneously, extracting structured data according to a provided JSON schema. This endpoint is designed for high-throughput document processing with citations and structured output.

This endpoint processes files in batches and returns a batch job ID for tracking progress. Use the batch status endpoint to monitor processing and retrieve results.

Request

pdf_files
file[]
required

Array of PDF files to process. Each file should be a valid PDF document.

schema_data
string
required

JSON schema defining the structure of data to extract from the PDFs. Must be a valid JSON string.

batch_size
integer
default:"10"

Number of PDFs to process in each batch. Must be a positive integer.

api-key
string
required

API key for authentication

Response

batch_id
string

Unique identifier for the batch processing job

status
string

Initial batch job status (typically “QUEUED”)

total_files
number

Total number of files submitted for processing

message
string

Status message about the batch job creation

Request Examples

curl -X POST "https://visionapi.unsiloed.ai/batch/cite" \
  -H "accept: application/json" \
  -H "api-key: your-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "pdf_files=@document1.pdf" \
  -F "pdf_files=@document2.pdf" \
  -F "pdf_files=@document3.pdf" \
  -F 'schema_data={"type":"object","properties":{"company_name":{"type":"string","description":"Name of the company"},"board_members":{"type":"array","items":{"type":"string"},"description":"List of board members"}},"required":["company_name"],"additionalProperties":false}' \
  -F "batch_size=10"

Response Examples

{
  "batch_id": "batch_f7e8d9c2-4a5b-6c7d-8e9f-0a1b2c3d4e5f",
  "status": "QUEUED",
  "total_files": 3,
  "message": "Batch job created successfully. Processing will begin shortly."
}

Schema Format

The schema_data parameter must be a valid JSON schema that defines the structure of data to extract from your PDFs.

Basic Schema Example

{
  "type": "object",
  "properties": {
    "company_name": {
      "type": "string",
      "description": "Name of the company from the document"
    },
    "date": {
      "type": "string",
      "description": "Document date in YYYY-MM-DD format"
    },
    "amount": {
      "type": "number",
      "description": "Total amount or value mentioned"
    }
  },
  "required": ["company_name"],
  "additionalProperties": false
}

Complex Schema Example

{
  "type": "object",
  "properties": {
    "board_of_directors": {
      "type": "array",
      "description": "List of board members",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "description": "Full name of board member"
          },
          "title": {
            "type": "string",
            "description": "Title or position"
          }
        },
        "required": ["name"],
        "additionalProperties": false
      }
    },
    "financial_data": {
      "type": "object",
      "properties": {
        "revenue": {
          "type": "number",
          "description": "Annual revenue"
        },
        "profit": {
          "type": "number",
          "description": "Net profit"
        }
      },
      "additionalProperties": false
    }
  },
  "required": ["board_of_directors"],
  "additionalProperties": false
}

Monitoring Batch Progress

After creating a batch job, use the batch status endpoint to monitor progress:

import requests
import time

def monitor_batch_progress(batch_id, api_key):
    """Monitor batch processing progress"""
    
    headers = {"api-key": api_key}
    status_url = f"https://visionapi.unsiloed.ai/batch/status/{batch_id}"
    
    while True:
        response = requests.get(status_url, headers=headers)
        
        if response.status_code == 200:
            status_data = response.json()
            
            print(f"Batch Status: {status_data['status']}")
            
            if 'statistics' in status_data:
                stats = status_data['statistics']
                print(f"Progress: {stats['completed_jobs']}/{stats['total_jobs']} completed")
                print(f"Failed: {stats['failed_jobs']}")
                print(f"Processing: {stats['processing_jobs']}")
            
            if status_data['status'] in ['COMPLETED', 'COMPLETED_WITH_FAILURES']:
                print("Batch processing finished!")
                return status_data
                
            elif status_data['status'] == 'FAILED':
                print("Batch processing failed!")
                return status_data
                
        else:
            print(f"Error checking status: {response.status_code}")
            
        time.sleep(10)  # Check every 10 seconds

# Usage
batch_id = "your-batch-id"
final_status = monitor_batch_progress(batch_id, "your-api-key")

Error Handling

Common Error Scenarios

  1. Invalid File Format: Only PDF files are supported
  2. Invalid Schema: Schema must be valid JSON with proper structure
  3. Invalid Batch Size: Must be a positive integer
  4. Authentication Error: Invalid or missing API key
  5. File Size Limits: Individual files may have size restrictions

Best Practices

  • Validate Schema: Test your JSON schema with a small batch first
  • File Quality: Use high-quality, text-based PDFs for better extraction
  • Batch Size: Start with smaller batch sizes (5-10 files) for testing
  • Monitor Progress: Regularly check batch status for large jobs
  • Error Recovery: Handle partial failures gracefully

Rate Limits

  • Concurrent Batches: Limited number of active batch jobs per API key
  • File Limits: Maximum number of files per batch may apply
  • Processing Time: Large batches may take significant time to complete

Check your API plan for specific limits and quotas.