Overview
This API provides powerful document processing capabilities using Vision models to extract structured data from PDF documents. It combines computer vision with natural language processing to understand document layouts, identify key information, and extract data according to custom JSON schemas.Key Features
Structured Data Extraction
Extract specific fields and data points from documents using custom JSON schemas
Bounding Box Detection
Identify and locate visual elements with precise coordinate information
Multi-format Support
Process 20+ document modalities (PDFs, PPTs, spreadsheets, images, and more) with image-based rendering, OCR, and structure-preserving parsing.
Async Processing
Handle large documents with background processing and job management
Getting Started with the /extract Endpoint
Here are the steps to get started:- Get an API Key - Sign up on Unsiloed AI to get API access
- Define a JSON Schema - Specify the fields you want to extract. In the Unsiloed AI Platform, you can directly generate a JSON schema through the UI and export that schema as an endpoint to call the
/extractendpoint
Defining Extraction Schemas
Unsiloed AI uses JSON Schema to define what data should be extracted from a document. You describe the structure you want, and the extraction engine returns structured JSON with citations, bounding boxes, and confidence scores for each field.Important: When defining a schema, keep field descriptions as detailed and specific as possible. Clear, pointed descriptions help the model correctly locate and extract the intended information, especially in complex or ambiguous documents.
Schema Rules - Detailed Guide
All extraction schemas must follow JSON Schema specification with strict constraints to ensure deterministic, production-safe outputs.Core Requirements
1. Root Object Every schema must start with"type": "object". Arrays or primitives are not allowed at root level.
"properties" key. Each field must specify a "type" and should include a clear "description".
"required" array. Field names must exactly match those defined in "properties".
"additionalProperties": false at every object level to ensure only specified fields appear in output.
Supported Types
String - For text, dates, IDs, names, addresses, and any textual dataitems to define the structure of array elements
Building Schemas
Primitive Types
For simple fields like strings, numbers, and booleans, you must use the formattype: "string". These are the building blocks of your schema.
Arrays of Objects
Use arrays when you have repeating data like line items, transactions, or people.Nested Arrays
You can nest arrays inside objects within arrays for complex hierarchical data.Example 1: Invoice Extraction
Example 1: Invoice Extraction
A common schema for extracting data from US invoices.
Example 2: Public Company Filing (10-K / Annual Report)
Example 2: Public Company Filing (10-K / Annual Report)
A schema for extracting governance and ownership information from US SEC filings.
How Extraction Works
Once you submit a schema:- Unsiloed locates each field in the document
- Extracts values that best match the schema
- Returns structured JSON with:
- Field-level confidence scores
- Word-level citations
- Bounding boxes mapped to the original document
Best Practices
- Keep field names simple and descriptive
- Use nested objects to reflect document structure
- Avoid free-form schemas—strict schemas produce better results
- Prefer arrays for repeated sections (line items, directors, transactions)
Basic Usage
Simple Data Extraction
Advanced Features
Asynchronous Processing
For large documents or batch processing, use the async endpoints:Response Format
Extraction Results Structure
The extraction API returns a complete job response with metadata and extracted results:Response Fields
Top-Level Fields
job_id: Unique identifier for the extraction jobstatus: Job status (completed,processing,failed, etc.)file_name: Name of the processed filecreated_at: Timestamp when the job was createdupdated_at: Timestamp of the last status updateresult: Object containing all extracted fields
Extracted Field Structure
Each field in theresult object contains:
value: The extracted data (type matches your schema: string, number, object, array)score: Confidence score between 0 and 1 (higher is better)page_no: Page number where the data was found (1-indexed)bboxes: Array of bounding box objects with location information
Bounding Box Structure
Each bounding box in thebboxes array includes:
bbox: Pixel coordinates[left, top, right, bottom]left, top: Top-left corner coordinatesright, bottom: Bottom-right corner coordinates
type: Either"segment"(document region) or"ocr"(word-level text)confidence: Segment confidence score (for segment type) ornull(for OCR type)page_width: Page width in pixels (for coordinate reference)page_height: Page height in pixels (for coordinate reference)text: Extracted text (only present for OCR-type bboxes)
Each extracted field typically has two bounding boxes: one for the document segment containing the data, and one for the precise OCR text location. This dual-level citation allows you to trace extractions back to both the visual region and the exact words in the document.

