Process multiple PDF documents with structured data extraction using JSON schema
The Batch Cite endpoint processes multiple PDF files simultaneously, extracting structured data according to a provided JSON schema. This endpoint is designed for high-throughput document processing with citations and structured output.
This endpoint processes files in batches and returns a batch job ID for tracking progress. Use the batch status endpoint to monitor processing and retrieve results.
Array of PDF files to process. Each file should be a valid PDF document.
JSON schema defining the structure of data to extract from the PDFs. Must be a valid JSON string.
Number of PDFs to process in each batch. Must be a positive integer.
API key for authentication
Unique identifier for the batch processing job
Initial batch job status (typically “QUEUED”)
Total number of files submitted for processing
Status message about the batch job creation
The schema_data
parameter must be a valid JSON schema that defines the structure of data to extract from your PDFs.
After creating a batch job, use the batch status endpoint to monitor progress:
Check your API plan for specific limits and quotas.