curl -X POST "https://prod.visionapi.unsiloed.ai/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "[email protected];type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
Extract structured data from PDF documents using custom schemas for citation and data extraction
curl -X POST "https://prod.visionapi.unsiloed.ai/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "[email protected];type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
semantic_citations (boolean): Enable segment-level citations. Default: falseocr_citations (boolean): Enable word-level citations with OCR coordinates. Default: false{
"semantic_citations": true,
"ocr_citations": true
}
true for segment-level and word-level citationssemantic_citations: true, ocr_citations: false for segment-level citations onlyfalse (default) when citations aren’t neededsemantic_citations or ocr_citations is enabled in citation_config.{
"type": "object",
"properties": {
"Individuals": {
"type": "string",
"description": "Percentage Holding"
},
"LIC of India": {
"type": "string",
"description": "No of Shares Held"
},
"United bank of india": {
"type": "string",
"description": "No of shares held by United bank of india"
}
},
"required": [
"Individuals",
"LIC of India",
"United bank of india"
],
"additionalProperties": false
}
{
"Individuals": {
"score": 0.9998314521743098,
"value": "10.57",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"LIC of India": {
"score": 0.9999889986487799,
"value": "1515000",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"United bank of india": {
"score": 0.999984548437705,
"value": "500000",
"bboxes": [
{
"bbox": [
79,
381,
524,
565
]
}
],
"page_no": 2
},
"min_confidence_score": 0.9998314521743098
}
curl -X POST "https://prod.visionapi.unsiloed.ai/extract" \
-H "accept: application/json" \
-H "api-key: your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "[email protected];type=application/pdf" \
-F "schema_data={\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\",\"description\":\"Document title\"},\"date\":{\"type\":\"string\",\"description\":\"Document date\"}},\"required\":[\"title\",\"date\"],\"additionalProperties\":false}"
{
"job_id": "945b4578-691f-4c74-8184-dde654093b11",
"status": "queued",
"message": "PDF citation processing started",
"quota_remaining": 48988
}
citation_config parameter controls the level of citation detail returned with extracted data. Citations provide references back to the source document, allowing you to trace where each extracted value was found.
true):false):true):false):{
"semantic_citations": true,
"ocr_citations": true
}
{
"semantic_citations": true,
"ocr_citations": false
}
{
"semantic_citations": false,
"ocr_citations": false
}
semantic_citations and ocr_citations are set to false (default), the response will only contain value and score for each field. The bboxes and page_no fields will be omitted.schema_data parameter must be a valid JSON Schema that defines the structure of data to extract. All schemas must follow the JSON Schema specification with proper type definitions, properties, and constraints.
type: “object” (root level)properties: Object defining the fields to extractrequired: Array of required field namesadditionalProperties: Set to false for strict validation{
"type": "object",
"properties": {
"Individuals": {
"type": "string",
"description": "Percentage Holding"
},
"LIC of India": {
"type": "number",
"description": "No of Shares Held"
},
"board of directors": {
"type": "array",
"description": "list of names of board of directors",
"items": {
"type": "object",
"required": [
"names of board of directors"
],
"properties": {
"names of board of directors": {
"type": "string",
"description": "names of all the members of board of directors of ACRE"
}
},
"additionalProperties": false
}
},
"shareholding pattern": {
"type": "array",
"description": "shareholding pattern",
"items": {
"type": "object",
"required": [
"name of shareholders",
"number of shares held"
],
"properties": {
"name of shareholders": {
"type": "string",
"description": "name of the shareholders in ACRE Table"
},
"number of shares held": {
"type": "string",
"description": "numbers of shares held by shareholders in ACRE Table"
}
},
"additionalProperties": false
}
}
},
"required": [
"Individuals",
"LIC of India",
"board of directors",
"shareholding pattern"
],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"shares held by Punjab National bank": {
"type": "string",
"description": "shares held by Punjab National bank"
},
"shares held by IFCI": {
"type": "string",
"description": "shares held by IFCI"
},
"shareholding pattern": {
"type": "object",
"description": "shareholding pattern",
"properties": {
"Percentage holding": {
"type": "array",
"description": "percentage holding of shareholders in ACRE",
"items": {
"type": "string",
"description": "percentage holding of shareholders in ACRE"
}
},
"Name of shareholders": {
"type": "array",
"description": "Names of shareholders in ACRE",
"items": {
"type": "string",
"description": "Names of shareholders in ACRE"
}
}
},
"required": ["Percentage holding", "Name of shareholders"],
"additionalProperties": false
},
"names of board of directors": {
"type": "array",
"description": "list of names of members of board of directors in ACRE",
"items": {
"type": "object",
"properties": {
"names of board of directors": {
"type": "string",
"description": "list of names of members of board of directors in ACRE"
}
},
"required": ["names of board of directors"],
"additionalProperties": false
}
}
},
"required": [
"shares held by Punjab National bank",
"shares held by IFCI",
"shareholding pattern",
"names of board of directors"
],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Document title or paper title"
},
"authors": {
"type": "array",
"description": "List of author names",
"items": {
"type": "string"
}
},
"publication_date": {
"type": "string",
"description": "Publication date in YYYY-MM-DD format"
},
"journal_name": {
"type": "string",
"description": "Name of journal or publication venue"
},
"doi": {
"type": "string",
"description": "Digital Object Identifier"
},
"abstract": {
"type": "string",
"description": "Document abstract or summary"
},
"keywords": {
"type": "array",
"description": "Key terms and subject keywords",
"items": {
"type": "string"
}
},
"references": {
"type": "array",
"description": "List of cited references",
"items": {
"type": "string"
}
}
},
"required": ["title", "authors"],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"document_type": {
"type": "string",
"description": "Type of legal document (contract, agreement, etc.)"
},
"parties": {
"type": "array",
"description": "Names of parties involved",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Party name"
},
"role": {
"type": "string",
"description": "Party role (e.g., buyer, seller, contractor)"
}
},
"required": ["name", "role"],
"additionalProperties": false
}
},
"effective_date": {
"type": "string",
"description": "Document effective date"
},
"key_terms": {
"type": "array",
"description": "Important terms and conditions",
"items": {
"type": "string"
}
},
"obligations": {
"type": "array",
"description": "Key obligations and responsibilities",
"items": {
"type": "object",
"properties": {
"party": {
"type": "string",
"description": "Party responsible for the obligation"
},
"obligation": {
"type": "string",
"description": "Description of the obligation"
}
},
"required": ["party", "obligation"],
"additionalProperties": false
}
}
},
"required": ["document_type", "parties", "effective_date"],
"additionalProperties": false
}
items property defining the type of array elements.properties defining nested structure.["string", "null"]import requests
import time
# After creating the extraction job, you receive a job_id
job_id = "945b4578-691f-4c74-8184-dde654093b11"
headers = {
"accept": "application/json",
"api-key": "your-api-key"
}
# Poll for job completion
while True:
response = requests.get(
f"https://prod.visionapi.unsiloed.ai/jobs/{job_id}",
headers=headers
)
if response.status_code == 200:
result = response.json()
print(f"Job status: {result['status']}")
if result['status'] == 'Succeeded':
print("Extraction completed!")
print("Extracted data:", result['result'])
break
elif result['status'] == 'Failed':
print(f"Job failed: {result.get('error', 'Unknown error')}")
break
else:
print(f"Error checking status: {response.status_code}")
break
time.sleep(5) # Wait 5 seconds before checking again
{
"type": "object",
"properties": {
"company_info": {
"type": "object",
"description": "Company identification and basic information",
"properties": {
"name": {
"type": "string",
"description": "Full company name"
},
"ticker": {
"type": "string",
"description": "Stock ticker symbol"
},
"sector": {
"type": "string",
"description": "Business sector"
}
},
"required": ["name"],
"additionalProperties": false
},
"financial_data": {
"type": "object",
"description": "Financial metrics and performance data",
"properties": {
"revenue": {
"type": "number",
"description": "Total revenue"
},
"profit_margin": {
"type": "number",
"description": "Profit margin percentage"
}
},
"required": ["revenue"],
"additionalProperties": false
}
},
"required": ["company_info", "financial_data"],
"additionalProperties": false
}
{
"type": "object",
"properties": {
"transactions": {
"type": "array",
"description": "List of financial transactions",
"items": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Transaction date"
},
"amount": {
"type": "number",
"description": "Transaction amount"
},
"description": {
"type": "string",
"description": "Transaction description"
},
"category": {
"type": "string",
"description": "Transaction category"
}
},
"required": ["date", "amount", "description"],
"additionalProperties": false
}
}
},
"required": ["transactions"],
"additionalProperties": false
}
The PDF file to process for data extraction. Maximum file size: 100MB
JSON schema defining the structure and fields to extract from the document. Example: {"type":"object","properties":{"invoice_number":{"type":"string","description":"The invoice number"}},"required":["invoice_number"],"additionalProperties":false}