curl -X POST "https://prod.visionapi.unsiloed.ai/v2/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "pdf_file=@document.pdf;type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
Extract structured data from PDF documents using custom schemas
curl -X POST "https://prod.visionapi.unsiloed.ai/v2/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "pdf_file=@document.pdf;type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
/v2/extract endpoint extracts structured data from PDF documents. It supports optional bounding box citations and handles large documents efficiently.
pdf_file or file_url must be provided.pdf_file or file_url must be provided.alpha, beta, gamma, delta.Recommended: gamma (default) — best balance of accuracy and speed.bboxes with precise location data in the source document.enable_citations is set to true.{
"type": "object",
"properties": {
"Individuals": {
"type": "string",
"description": "Percentage Holding"
},
"LIC of India": {
"type": "string",
"description": "No of Shares Held"
},
"United bank of india": {
"type": "string",
"description": "No of shares held by United bank of india"
}
},
"required": [
"Individuals",
"LIC of India",
"United bank of india"
],
"additionalProperties": false
}
{
"Individuals": {
"score": 0.9998314521743098,
"value": "10.57",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"LIC of India": {
"score": 0.9999889986487799,
"value": "1515000",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"United bank of india": {
"score": 0.999984548437705,
"value": "500000",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"min_confidence_score": 0.9998314521743098
}
curl -X POST "https://prod.visionapi.unsiloed.ai/v2/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "pdf_file=@document.pdf;type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
enable_citations parameter controls whether bounding box coordinates are returned with extracted data. Citations provide references back to the source document, allowing you to trace where each extracted value was found.
enable_citations is set to true, each extracted field includes bboxes with precise location data:
{
"invoice_number": {
"value": "INV-2025-001",
"page_no": 1,
"score": 0.97,
"bboxes": [
{
"bbox": [139, 209, 280, 222],
"text": "INV-2025-001",
"confidence": 0.95,
"page_width": 595.0,
"page_height": 842.0
}
]
}
}
bbox: [left, top, right, bottom] in PDF point space (origin: top-left)page_width / page_height included for scaling to any display sizeenable_citations is false (default), the response contains value, score, and page_no for each field without bounding box data:
{
"invoice_number": {
"value": "INV-2025-001",
"page_no": 1,
"score": 0.97
}
}
enable_citations to true when you need to trace extracted values back to their exact location in the document, such as for UI highlighting or audit trails.schema_data parameter must be a valid JSON Schema that defines the structure of data to extract. All schemas must follow the JSON Schema specification with proper type definitions, properties, and constraints.
type: “object” (root level)properties: Object defining the fields to extractrequired: Array of required field namesadditionalProperties: Set to false for strict validation{
"type": "object",
"properties": {
"Individuals": {
"type": "string",
"description": "Percentage Holding"
},
"LIC of India": {
"type": "number",
"description": "No of Shares Held"
},
"board of directors": {
"type": "array",
"description": "list of names of board of directors",
"items": {
"type": "object",
"required": [
"names of board of directors"
],
"properties": {
"names of board of directors": {
"type": "string",
"description": "names of all the members of board of directors of ACRE"
}
},
"additionalProperties": false
}
},
"shareholding pattern": {
"type": "array",
"description": "shareholding pattern",
"items": {
"type": "object",
"required": [
"name of shareholders",
"number of shares held"
],
"properties": {
"name of shareholders": {
"type": "string",
"description": "name of the shareholders in ACRE Table"
},
"number of shares held": {
"type": "string",
"description": "numbers of shares held by shareholders in ACRE Table"
}
},
"additionalProperties": false
}
}
},
"required": [
"Individuals",
"LIC of India",
"board of directors",
"shareholding pattern"
],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"shares held by Punjab National bank": {
"type": "string",
"description": "shares held by Punjab National bank"
},
"shares held by IFCI": {
"type": "string",
"description": "shares held by IFCI"
},
"shareholding pattern": {
"type": "object",
"description": "shareholding pattern",
"properties": {
"Percentage holding": {
"type": "array",
"description": "percentage holding of shareholders in ACRE",
"items": {
"type": "string",
"description": "percentage holding of shareholders in ACRE"
}
},
"Name of shareholders": {
"type": "array",
"description": "Names of shareholders in ACRE",
"items": {
"type": "string",
"description": "Names of shareholders in ACRE"
}
}
},
"required": ["Percentage holding", "Name of shareholders"],
"additionalProperties": false
},
"names of board of directors": {
"type": "array",
"description": "list of names of members of board of directors in ACRE",
"items": {
"type": "object",
"properties": {
"names of board of directors": {
"type": "string",
"description": "list of names of members of board of directors in ACRE"
}
},
"required": ["names of board of directors"],
"additionalProperties": false
}
}
},
"required": [
"shares held by Punjab National bank",
"shares held by IFCI",
"shareholding pattern",
"names of board of directors"
],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Document title or paper title"
},
"authors": {
"type": "array",
"description": "List of author names",
"items": {
"type": "string"
}
},
"publication_date": {
"type": "string",
"description": "Publication date in YYYY-MM-DD format"
},
"journal_name": {
"type": "string",
"description": "Name of journal or publication venue"
},
"doi": {
"type": "string",
"description": "Digital Object Identifier"
},
"abstract": {
"type": "string",
"description": "Document abstract or summary"
},
"keywords": {
"type": "array",
"description": "Key terms and subject keywords",
"items": {
"type": "string"
}
},
"references": {
"type": "array",
"description": "List of cited references",
"items": {
"type": "string"
}
}
},
"required": ["title", "authors"],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"document_type": {
"type": "string",
"description": "Type of legal document (contract, agreement, etc.)"
},
"parties": {
"type": "array",
"description": "Names of parties involved",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Party name"
},
"role": {
"type": "string",
"description": "Party role (e.g., buyer, seller, contractor)"
}
},
"required": ["name", "role"],
"additionalProperties": false
}
},
"effective_date": {
"type": "string",
"description": "Document effective date"
},
"key_terms": {
"type": "array",
"description": "Important terms and conditions",
"items": {
"type": "string"
}
},
"obligations": {
"type": "array",
"description": "Key obligations and responsibilities",
"items": {
"type": "object",
"properties": {
"party": {
"type": "string",
"description": "Party responsible for the obligation"
},
"obligation": {
"type": "string",
"description": "Description of the obligation"
}
},
"required": ["party", "obligation"],
"additionalProperties": false
}
}
},
"required": ["document_type", "parties", "effective_date"],
"additionalProperties": false
}
items property defining the type of array elements.properties defining nested structure.["string", "null"]import requests
import time
# After creating the extraction job, you receive a job_id
job_id = "945b4578-691f-4c74-8184-dde654093b11"
headers = {
"accept": "application/json",
"api-key": "your-api-key"
}
# Poll for job completion
while True:
response = requests.get(
f"https://prod.visionapi.unsiloed.ai/extract/{job_id}",
headers=headers
)
if response.status_code == 200:
result = response.json()
print(f"Job status: {result['status']}")
if result['status'] == 'Succeeded':
print("Extraction completed!")
print("Extracted data:", result['result'])
break
elif result['status'] == 'Failed':
print(f"Job failed: {result.get('error', 'Unknown error')}")
break
else:
print(f"Error checking status: {response.status_code}")
break
time.sleep(5) # Wait 5 seconds before checking again
{
"type": "object",
"properties": {
"company_info": {
"type": "object",
"description": "Company identification and basic information",
"properties": {
"name": {
"type": "string",
"description": "Full company name"
},
"ticker": {
"type": "string",
"description": "Stock ticker symbol"
},
"sector": {
"type": "string",
"description": "Business sector"
}
},
"required": ["name"],
"additionalProperties": false
},
"financial_data": {
"type": "object",
"description": "Financial metrics and performance data",
"properties": {
"revenue": {
"type": "number",
"description": "Total revenue"
},
"profit_margin": {
"type": "number",
"description": "Profit margin percentage"
}
},
"required": ["revenue"],
"additionalProperties": false
}
},
"required": ["company_info", "financial_data"],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"transactions": {
"type": "array",
"description": "List of financial transactions",
"items": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Transaction date"
},
"amount": {
"type": "number",
"description": "Transaction amount"
},
"description": {
"type": "string",
"description": "Transaction description"
},
"category": {
"type": "string",
"description": "Transaction category"
}
},
"required": ["date", "amount", "description"],
"additionalProperties": false
}
}
},
"required": ["transactions"],
"additionalProperties": false
}
JSON schema defining the structure and fields to extract from the document. Example: {"type":"object","properties":{"invoice_number":{"type":"string","description":"The invoice number"}},"required":["invoice_number"],"additionalProperties":false}
The PDF file to process for data extraction. Maximum file size: 100MB. Either pdf_file or file_url must be provided.
URL to a PDF file to process. Either pdf_file or file_url must be provided.
Model tier to use for extraction. Options: alpha, beta, gamma (default, recommended), delta
alpha, beta, gamma, delta Return bounding box coordinates for extracted values