Overview
This API provides powerful document processing capabilities using Vision models to extract structured data from PDF documents. It combines computer vision with natural language processing to understand document layouts, identify key information, and extract data according to custom JSON schemas.Key Features
Structured Data Extraction
Extract specific fields and data points from documents using custom JSON schemas
Bounding Box Detection
Identify and locate visual elements with precise coordinate information
Multi-format Support
Process PDF documents with high-quality image rendering and OCR capabilities
Async Processing
Handle large documents with background processing and job management
How It Works
The Vision API processes documents through several stages:- PDF Rendering: Converts PDF pages to high-quality images
- Vision Analysis: Uses Vision models to understand document structure
- Data Extraction: Extracts information based on provided JSON schemas
- Bounding Box Detection: Identifies locations of extracted elements
- Result Compilation: Returns structured data with confidence scores and metadata
JSON Schema Format
All extraction schemas must follow the JSON Schema specification:- type: “object” (root level)
- properties: Object defining the fields to extract
- required: Array of required field names
- additionalProperties: Set to- falsefor strict validation
Basic Usage
Simple Data Extraction
Financial Document Extraction
Advanced Features
Asynchronous Processing
For large documents or batch processing, use the async endpoints:Complex Schema Definition
Define complex schemas for structured extraction:Shareholding Pattern Schema
For financial documents with complex shareholding data:Response Format
Extraction Results Structure
Each extracted field returns an object with detailed metadata:- value: The extracted data matching the schema type
- score: Confidence score between 0 and 1
- bboxes: Array of bounding box coordinates
- page_no: Page number where data was found
- min_confidence_score: Overall minimum confidence
Standard Extraction Response
Bounding Box Information
Bounding boxes provide precise location data:- Coordinates are in pixels: [x1, y1, x2, y2]
- x1, y1: Top-left corner
- x2, y2: Bottom-right corner
- Page numbers are 1-indexed
