Overview
Document parsing in our Vision API is achieved through intelligent chunking strategies that analyze document structure using advanced AI and vision language models. The parsing functionality identifies different document elements like text blocks, tables, images, headers, and footers while maintaining proper reading order and extracting content with high accuracy.Parsing capabilities are accessed through the
/parse endpoint, which combines structure detection with content extraction and intelligent segmentation.Key Features
Element Detection
Identify and classify document elements using advanced AI models
Content Extraction
Extract text, tables, and images with appropriate processing methods
Reading Order
Maintain proper document flow and reading sequence
Multi-Modal Processing
Handle text, images, tables, and formulas with specialized extractors
Document Element Types
The parsing system can identify and process the following element types through chunking strategies:Text Elements
- Text: Regular paragraph text
- Title: Document and section titles
- Section-header: Section headings
- Page-header: Header content
- Page-footer: Footer content
- Caption: Image and table captions
- Footnote: Footnote references and content
- List-item: Bulleted and numbered lists
Visual Elements
- Table: Structured tabular data
- Picture: Images, charts, and diagrams
- Formula: Mathematical equations and expressions
Quick Start
Using Local files
Using Presigned URLs
Instead of uploading files directly, you can provide a presigned URL using theurl parameter. This is ideal for documents already stored in cloud storage (S3, GCS, Azure Blob, etc.).
For detailed API parameters and configuration options, see the Parse API Reference.
Response Format
Parsed content is organized into chunks with segments containing rich metadata, bounding boxes, and extracted content:Response Fields
Top-Level Fields
job_id: Unique identifier for the parsing jobstatus: Job status (Succeeded,Failed,Processing, etc.)file_name: Name of the uploaded filetotal_chunks: Number of content chunks generatedpage_count: Total number of pages in the documentpdf_url: Temporary S3 URL to access the processed PDFmetadata: Additional processing metadata and settings
Chunk Fields
chunk_id: Unique identifier for each chunkchunk_length: Character length of the chunkembed: Combined markdown content from all segments in the chunk (ideal for RAG/embeddings)segments: Array of document elements within the chunk
Segment Fields
segment_id: Unique identifier for each segmentsegment_type: Element classification (Text, Title, Table, Picture, SectionHeader, etc.)content: Plain text content (for text segments)markdown: Markdown-formatted contenthtml: HTML-formatted contentimage: S3 URL for visual elements (Picture, Table)bbox: Bounding box withleft,top,width,heightcoordinatespage_number: Page where the segment appearspage_width/page_height: Page dimensions for coordinate referenceconfidence: AI model confidence score (0-1) for element detectionocr: Array of OCR results with word-level text, bounding boxes, and colors
Use Cases
- Document Q&A: Extract structured content for RAG applications
- Content Migration: Parse and convert documents to markdown or HTML
- Data Extraction: Identify and extract specific document sections
- Archive Processing: Batch process large document collections

