Overview
This page provides ready-to-use Anthropic tool-use JSON schemas that let Claude interact with Unsiloed’s document processing API. Copy the tool definitions into yourmessages.create() call and Claude can parse, extract, classify, and split documents on your behalf.
All Unsiloed API operations are asynchronous. Each tool submits a job and returns a
job_id. Use the unsiloed_get_job_result tool to poll for results. Since Claude cannot upload binary files, all tools accept a publicly accessible URL to the document.Prerequisites
Before using these tool schemas, you’ll need:- An Unsiloed API key — sign up at unsiloed.ai to get one
- An Anthropic API key — get one from the Anthropic console
- Python 3.8+ or Node.js 18+
pip install anthropic requests
Key Features
Drop-In Tool Schemas
Copy-paste JSON tool definitions directly into your Anthropic API calls
Async Job Handling
Built-in polling tool to retrieve results from asynchronous operations
Core Document Operations
Parse documents, extract structured data, classify, and split PDFs
Agentic Loop Ready
Full working example of an autonomous document processing agent
Tool Definitions
Each tool follows Anthropic’s tool format withname, description, and input_schema. Expand each tool below to see its full schema.
unsiloed_parse_document
unsiloed_parse_document
Parse documents (PDF, images, Office files) into structured chunks with element detection, OCR, and reading order analysis.
{
"name": "unsiloed_parse_document",
"description": "Parse a document into structured chunks with element detection, text extraction, and reading order analysis. Use this tool when the user wants to break a document into its structural components (text blocks, tables, images, headers, footers) or convert a document to markdown. You must provide a publicly accessible URL to the document. Supported formats include PDF, PNG, JPEG, TIFF, BMP, DOCX, XLSX, and PPTX. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the parsed content.",
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Publicly accessible URL to the document to parse."
},
"use_high_resolution": {
"type": "boolean",
"description": "Use high-resolution image processing for better OCR accuracy. Defaults to false."
},
"segmentation_method": {
"type": "string",
"enum": ["smart_layout_detection", "page_by_page"],
"description": "Document segmentation strategy. 'smart_layout_detection' (default) groups related elements into semantic chunks. 'page_by_page' creates one chunk per page."
},
"ocr_mode": {
"type": "string",
"enum": ["auto_ocr", "full_ocr"],
"description": "OCR strategy. 'auto_ocr' (default) only runs OCR when needed. 'full_ocr' forces OCR on all pages."
},
"merge_tables": {
"type": "boolean",
"description": "Merge adjacent table segments into a single table. Defaults to false."
},
"segment_filter": {
"type": "string",
"description": "Filter output to include only specific segment types. Accepts comma-separated values (e.g., 'table', 'picture', 'table,picture') or 'all' for everything. Defaults to 'all'."
},
"xml_citation": {
"type": "boolean",
"description": "Enable citation extraction from PDF documents. Extracts structured bibliography and in-text citations in markdown output. Defaults to false."
},
"output_fields": {
"type": "string",
"description": "JSON object controlling which fields are included in the response. Set fields to false to exclude them and reduce response size. Available fields: html, markdown, ocr, image, content, bbox, confidence, embed. All default to true. Example: {\"html\":true,\"markdown\":true,\"ocr\":false,\"image\":false}"
},
"segment_analysis": {
"type": "string",
"description": "JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Example: {\"Table\":{\"html\":\"VLM\",\"markdown\":\"VLM\"},\"Picture\":{\"html\":\"VLM\",\"markdown\":\"VLM\"}}"
},
"page_range": {
"type": "string",
"description": "Specify which pages to process. Formats: '1-5', '2,4,6', or '[1,3,5]'. Defaults to all pages."
},
"segment_type_naming": {
"type": "string",
"enum": ["Unsiloed", "Other"],
"description": "Segment type naming convention. 'Unsiloed' (default) uses names like PageHeader, ListItem, Picture. 'Other' uses alternative names like Header, List Item, Figure."
},
"detect_checkboxes": {
"type": "boolean",
"description": "Detect and identify checkboxes in the document with their bounding box locations. Defaults to false."
},
"extract_charts": {
"type": "boolean",
"description": "Extract structured data from charts and graphs, including data points and chart type information. Defaults to false."
},
"extract_colors": {
"type": "boolean",
"description": "Transfer text color from the PDF text layer to OCR results. Defaults to false."
},
"extract_links": {
"type": "boolean",
"description": "Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false."
},
"export_format": {
"type": "string",
"description": "JSON array of export format(s) to generate after parsing. The exported files are available as presigned URLs in the exports field of the response. Currently supported: [\"docx\"]. Example: [\"docx\"]"
},
"error_handling": {
"type": "string",
"enum": ["Continue", "Fail"],
"description": "How to handle per-page errors. 'Continue' (default) skips failed pages and continues. 'Fail' aborts the entire job on the first error."
},
"expires_in": {
"type": "integer",
"description": "Seconds until the task and its output are automatically deleted."
},
"chunk_processing": {
"type": "string",
"description": "JSON object for chunk processing configuration."
},
"llm_processing": {
"type": "string",
"description": "JSON object for LLM processing configuration."
}
},
"required": ["url"]
}
}
unsiloed_extract_data
unsiloed_extract_data
Extract structured data from PDF documents using a custom JSON schema you define.
{
"name": "unsiloed_extract_data",
"description": "Extract structured data from a PDF document using a custom JSON schema. Use this tool when the user wants to pull specific fields (such as invoice numbers, names, dates, or line items) out of a PDF. You must provide a publicly accessible URL to the PDF and a JSON schema defining the fields to extract. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the extracted data. Do not use this tool for document classification or splitting.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to process."
},
"schema_data": {
"type": "string",
"description": "A JSON-stringified schema defining the fields to extract. Example: {\"type\":\"object\",\"properties\":{\"invoice_number\":{\"type\":\"string\",\"description\":\"The invoice number\"},\"total\":{\"type\":\"number\",\"description\":\"Total amount\"}},\"required\":[\"invoice_number\"],\"additionalProperties\":false}"
},
"model": {
"type": "string",
"enum": ["alpha", "beta", "gamma", "delta"],
"description": "Model tier for extraction. Default is 'gamma', recommended for most use cases."
},
"enable_citations": {
"type": "boolean",
"description": "When true, returns bounding box coordinates for each extracted value, enabling you to trace data back to its location in the PDF. Defaults to false."
}
},
"required": ["file_url", "schema_data"]
}
}
unsiloed_classify_document
unsiloed_classify_document
Classify a PDF document into one of several predefined categories with confidence scoring.
{
"name": "unsiloed_classify_document",
"description": "Classify a PDF document into one of several predefined categories. Use this tool when the user wants to determine what type of document a PDF is (for example, invoice, receipt, contract, or medical record). You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the classification result with confidence scores. Do not use this for data extraction or document splitting.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to classify."
},
"categories": {
"type": "string",
"description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices with itemized charges\"},{\"name\":\"Receipt\"},{\"name\":\"Contract\",\"description\":\"Legal agreements\"}]"
}
},
"required": ["file_url", "categories"]
}
}
unsiloed_split_document
unsiloed_split_document
Split a multi-document PDF into separate files by classifying each page into categories.
{
"name": "unsiloed_split_document",
"description": "Split a multi-document PDF into separate files by classifying each page into predefined categories. Use this tool when the user has a single PDF containing multiple document types (for example, a scanned batch of invoices, receipts, and contracts) and wants them separated into individual files. You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve download links for the split files. Do not use this for single-document classification or data extraction.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to split."
},
"categories": {
"type": "string",
"description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices\"},{\"name\":\"Contract\"},{\"name\":\"Receipt\"}]"
}
},
"required": ["file_url", "categories"]
}
}
unsiloed_get_job_result
unsiloed_get_job_result
Poll for the result of any asynchronous Unsiloed job.
{
"name": "unsiloed_get_job_result",
"description": "Poll for the result of an asynchronous Unsiloed job. Use this tool after calling unsiloed_parse_document, unsiloed_extract_data, unsiloed_classify_document, or unsiloed_split_document to check whether the job has completed and retrieve its results. If the status indicates the job is still processing, wait a few seconds and call this tool again. Once the job is complete, the response contains the output data. If the job failed, the response includes an error message explaining what went wrong.",
"input_schema": {
"type": "object",
"properties": {
"job_id": {
"type": "string",
"description": "The job_id returned by a previous Unsiloed tool call."
},
"job_type": {
"type": "string",
"enum": ["parse", "extract", "classify", "splitter"],
"description": "The type of job to check. Use 'parse' for parsing jobs, 'extract' for extraction jobs, 'classify' for classification jobs, and 'splitter' for splitting jobs."
}
},
"required": ["job_id", "job_type"]
}
}
All Tools (Copy & Paste)
Copy this completetools array and pass it directly to your Anthropic API call:
[
{
"name": "unsiloed_parse_document",
"description": "Parse a document into structured chunks with element detection, text extraction, and reading order analysis. Use this tool when the user wants to break a document into its structural components (text blocks, tables, images, headers, footers) or convert a document to markdown. You must provide a publicly accessible URL to the document. Supported formats include PDF, PNG, JPEG, TIFF, BMP, DOCX, XLSX, and PPTX. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the parsed content.",
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Publicly accessible URL to the document to parse."
},
"use_high_resolution": {
"type": "boolean",
"description": "Use high-resolution image processing for better OCR accuracy. Defaults to false."
},
"segmentation_method": {
"type": "string",
"enum": ["smart_layout_detection", "page_by_page"],
"description": "Document segmentation strategy. 'smart_layout_detection' (default) groups related elements into semantic chunks. 'page_by_page' creates one chunk per page."
},
"ocr_mode": {
"type": "string",
"enum": ["auto_ocr", "full_ocr"],
"description": "OCR strategy. 'auto_ocr' (default) only runs OCR when needed. 'full_ocr' forces OCR on all pages."
},
"merge_tables": {
"type": "boolean",
"description": "Merge adjacent table segments into a single table. Defaults to false."
},
"segment_filter": {
"type": "string",
"description": "Filter output to include only specific segment types. Accepts comma-separated values (e.g., 'table', 'picture', 'table,picture') or 'all' for everything. Defaults to 'all'."
},
"xml_citation": {
"type": "boolean",
"description": "Enable citation extraction from PDF documents. Extracts structured bibliography and in-text citations in markdown output. Defaults to false."
},
"output_fields": {
"type": "string",
"description": "JSON object controlling which fields are included in the response. Set fields to false to exclude them and reduce response size. Available fields: html, markdown, ocr, image, content, bbox, confidence, embed. All default to true. Example: {\"html\":true,\"markdown\":true,\"ocr\":false,\"image\":false}"
},
"segment_analysis": {
"type": "string",
"description": "JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Example: {\"Table\":{\"html\":\"VLM\",\"markdown\":\"VLM\"},\"Picture\":{\"html\":\"VLM\",\"markdown\":\"VLM\"}}"
},
"page_range": {
"type": "string",
"description": "Specify which pages to process. Formats: '1-5', '2,4,6', or '[1,3,5]'. Defaults to all pages."
},
"segment_type_naming": {
"type": "string",
"enum": ["Unsiloed", "Other"],
"description": "Segment type naming convention. 'Unsiloed' (default) uses names like PageHeader, ListItem, Picture. 'Other' uses alternative names like Header, List Item, Figure."
},
"detect_checkboxes": {
"type": "boolean",
"description": "Detect and identify checkboxes in the document with their bounding box locations. Defaults to false."
},
"extract_charts": {
"type": "boolean",
"description": "Extract structured data from charts and graphs, including data points and chart type information. Defaults to false."
},
"extract_colors": {
"type": "boolean",
"description": "Transfer text color from the PDF text layer to OCR results. Defaults to false."
},
"extract_links": {
"type": "boolean",
"description": "Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false."
},
"export_format": {
"type": "string",
"description": "JSON array of export format(s) to generate after parsing. The exported files are available as presigned URLs in the exports field of the response. Currently supported: [\"docx\"]. Example: [\"docx\"]"
},
"error_handling": {
"type": "string",
"enum": ["Continue", "Fail"],
"description": "How to handle per-page errors. 'Continue' (default) skips failed pages and continues. 'Fail' aborts the entire job on the first error."
},
"expires_in": {
"type": "integer",
"description": "Seconds until the task and its output are automatically deleted."
},
"chunk_processing": {
"type": "string",
"description": "JSON object for chunk processing configuration."
},
"llm_processing": {
"type": "string",
"description": "JSON object for LLM processing configuration."
}
},
"required": ["url"]
}
},
{
"name": "unsiloed_extract_data",
"description": "Extract structured data from a PDF document using a custom JSON schema. Use this tool when the user wants to pull specific fields (such as invoice numbers, names, dates, or line items) out of a PDF. You must provide a publicly accessible URL to the PDF and a JSON schema defining the fields to extract. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the extracted data. Do not use this tool for document classification or splitting.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to process."
},
"schema_data": {
"type": "string",
"description": "A JSON-stringified schema defining the fields to extract. Example: {\"type\":\"object\",\"properties\":{\"invoice_number\":{\"type\":\"string\",\"description\":\"The invoice number\"},\"total\":{\"type\":\"number\",\"description\":\"Total amount\"}},\"required\":[\"invoice_number\"],\"additionalProperties\":false}"
},
"model": {
"type": "string",
"enum": ["alpha", "beta", "gamma", "delta"],
"description": "Model tier for extraction. Default is 'gamma', recommended for most use cases."
},
"enable_citations": {
"type": "boolean",
"description": "When true, returns bounding box coordinates for each extracted value, enabling you to trace data back to its location in the PDF. Defaults to false."
}
},
"required": ["file_url", "schema_data"]
}
},
{
"name": "unsiloed_classify_document",
"description": "Classify a PDF document into one of several predefined categories. Use this tool when the user wants to determine what type of document a PDF is (for example, invoice, receipt, contract, or medical record). You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the classification result with confidence scores. Do not use this for data extraction or document splitting.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to classify."
},
"categories": {
"type": "string",
"description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices with itemized charges\"},{\"name\":\"Receipt\"},{\"name\":\"Contract\",\"description\":\"Legal agreements\"}]"
}
},
"required": ["file_url", "categories"]
}
},
{
"name": "unsiloed_split_document",
"description": "Split a multi-document PDF into separate files by classifying each page into predefined categories. Use this tool when the user has a single PDF containing multiple document types (for example, a scanned batch of invoices, receipts, and contracts) and wants them separated into individual files. You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve download links for the split files. Do not use this for single-document classification or data extraction.",
"input_schema": {
"type": "object",
"properties": {
"file_url": {
"type": "string",
"description": "Publicly accessible URL to the PDF file to split."
},
"categories": {
"type": "string",
"description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices\"},{\"name\":\"Contract\"},{\"name\":\"Receipt\"}]"
}
},
"required": ["file_url", "categories"]
}
},
{
"name": "unsiloed_get_job_result",
"description": "Poll for the result of an asynchronous Unsiloed job. Use this tool after calling unsiloed_parse_document, unsiloed_extract_data, unsiloed_classify_document, or unsiloed_split_document to check whether the job has completed and retrieve its results. If the status indicates the job is still processing, wait a few seconds and call this tool again. Once the job is complete, the response contains the output data. If the job failed, the response includes an error message explaining what went wrong.",
"input_schema": {
"type": "object",
"properties": {
"job_id": {
"type": "string",
"description": "The job_id returned by a previous Unsiloed tool call."
},
"job_type": {
"type": "string",
"enum": ["parse", "extract", "classify", "splitter"],
"description": "The type of job to check. Use 'parse' for parsing jobs, 'extract' for extraction jobs, 'classify' for classification jobs, and 'splitter' for splitting jobs."
}
},
"required": ["job_id", "job_type"]
}
}
]
Usage with the Anthropic SDK
Here’s how to register these tools and handle Claude’s tool calls:import anthropic
import requests
import json
import time
UNSILOED_API_KEY = "your-unsiloed-api-key"
UNSILOED_BASE_URL = "https://prod.visionapi.unsiloed.ai"
UNSILOED_HEADERS = {"api-key": UNSILOED_API_KEY}
# Load the tools array from the copy-paste block above
tools = [...] # Paste the full tools array here
client = anthropic.Anthropic(api_key="your-anthropic-api-key")
def process_tool_call(tool_name: str, tool_input: dict) -> str:
"""Execute an Unsiloed API tool call and return the result as a string."""
if tool_name == "unsiloed_parse_document":
data = {"url": tool_input["url"]}
if "use_high_resolution" in tool_input:
data["use_high_resolution"] = str(tool_input["use_high_resolution"]).lower()
if "segmentation_method" in tool_input:
data["segmentation_method"] = tool_input["segmentation_method"]
if "ocr_mode" in tool_input:
data["ocr_mode"] = tool_input["ocr_mode"]
if "merge_tables" in tool_input:
data["merge_tables"] = str(tool_input["merge_tables"]).lower()
if "segment_filter" in tool_input:
data["segment_filter"] = tool_input["segment_filter"]
if "xml_citation" in tool_input:
data["xml_citation"] = str(tool_input["xml_citation"]).lower()
resp = requests.post(f"{UNSILOED_BASE_URL}/parse", headers=UNSILOED_HEADERS, data=data)
elif tool_name == "unsiloed_extract_data":
data = {
"file_url": tool_input["file_url"],
"schema_data": tool_input["schema_data"],
}
if "model" in tool_input:
data["model"] = tool_input["model"]
if "enable_citations" in tool_input:
data["enable_citations"] = str(tool_input["enable_citations"]).lower()
resp = requests.post(f"{UNSILOED_BASE_URL}/v2/extract", headers=UNSILOED_HEADERS, data=data)
elif tool_name == "unsiloed_classify_document":
data = {
"file_url": tool_input["file_url"],
"categories": tool_input["categories"],
}
resp = requests.post(f"{UNSILOED_BASE_URL}/classify", headers=UNSILOED_HEADERS, data=data)
elif tool_name == "unsiloed_split_document":
data = {
"file_url": tool_input["file_url"],
"categories": tool_input["categories"],
}
resp = requests.post(f"{UNSILOED_BASE_URL}/splitter", headers=UNSILOED_HEADERS, data=data)
elif tool_name == "unsiloed_get_job_result":
job_type = tool_input["job_type"]
job_id = tool_input["job_id"]
time.sleep(5) # Brief pause before polling
resp = requests.get(f"{UNSILOED_BASE_URL}/{job_type}/{job_id}", headers=UNSILOED_HEADERS)
else:
return json.dumps({"error": f"Unknown tool: {tool_name}"})
return json.dumps(resp.json())
Complete Agentic Loop Example
This standalone example shows a full autonomous loop where Claude processes a document end-to-end:Python
import anthropic
import requests
import json
import time
# --- Configuration ---
ANTHROPIC_API_KEY = "your-anthropic-api-key"
UNSILOED_API_KEY = "your-unsiloed-api-key"
UNSILOED_BASE_URL = "https://prod.visionapi.unsiloed.ai"
UNSILOED_HEADERS = {"api-key": UNSILOED_API_KEY}
# --- Load tools (paste the full tools array from above) ---
tools = [...] # Paste the complete tools JSON array here
# --- Tool executor ---
def process_tool_call(tool_name: str, tool_input: dict) -> str:
if tool_name == "unsiloed_extract_data":
resp = requests.post(
f"{UNSILOED_BASE_URL}/v2/extract",
headers=UNSILOED_HEADERS,
data={
"file_url": tool_input["file_url"],
"schema_data": tool_input["schema_data"],
**({"model": tool_input["model"]} if "model" in tool_input else {}),
},
)
elif tool_name == "unsiloed_get_job_result":
time.sleep(5)
resp = requests.get(
f"{UNSILOED_BASE_URL}/{tool_input['job_type']}/{tool_input['job_id']}",
headers=UNSILOED_HEADERS,
)
# Add other tool handlers (parse, classify, split) as needed
else:
return json.dumps({"error": f"Unhandled tool: {tool_name}"})
return json.dumps(resp.json())
# --- Agentic loop ---
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
messages = [
{
"role": "user",
"content": "Extract the invoice number, date, and total amount from this PDF: https://example.com/invoice.pdf",
}
]
print("Starting agentic loop...\n")
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
)
# Collect assistant response
messages.append({"role": "assistant", "content": response.content})
# If Claude is done, print the final text and exit
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
print(f"Claude: {block.text}")
break
# If Claude wants to use tools, execute them
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" -> Calling {block.name}({json.dumps(block.input, indent=2)[:100]}...)")
result = process_tool_call(block.name, block.input)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
}
)
messages.append({"role": "user", "content": tool_results})
print("\nDone.")
Error Handling
When integrating with the Unsiloed API, handle these common scenarios:Job failure
Job failure
If a job’s status is
"Failed" or "failed", the response includes an error message. Parse jobs use capitalized statuses (Succeeded, Failed), while extraction, classification, and splitting jobs use lowercase (completed, failed).{
"status": "failed",
"error": "Error processing document: unsupported file format"
}
Authentication errors
Authentication errors
A
401 response means the API key is missing or invalid. Ensure you pass the key in the api-key header.{
"detail": "API key is required. Please provide 'api-key' in the request header."
}
Quota exceeded
Quota exceeded
A
402 response means your organization has run out of credits. Check the quota_remaining field in successful responses to monitor usage proactively.{
"detail": {
"message": "Insufficient quota",
"status": "QUOTA_EXCEEDED"
}
}
Polling timeout
Polling timeout
If a job hasn’t completed after 5 minutes of polling, treat it as a timeout. Jobs rarely take longer than 2 minutes for standard documents.
Best Practices
- Use descriptive categories — When classifying or splitting, add
descriptionfields to your category objects. This significantly improves accuracy. - Poll with backoff — Wait 5-10 seconds between
unsiloed_get_job_resultcalls. Tight polling wastes quota and adds no benefit. - Use publicly accessible URLs — Claude cannot upload binary files. Use presigned URLs from your cloud storage or any publicly accessible link.
- Keep extraction schemas focused — Smaller, targeted extraction schemas produce better results than large catch-all schemas. Extract what you need.
- Handle errors gracefully — Always check job status before processing results. Return clear error messages so Claude can inform the user.
- Monitor your quota — Check the
quota_remainingfield in API responses and alert when running low.
Next Steps
Parsing
Learn about document parsing and structure analysis
Classification
Explore document classification with confidence scoring
Splitting
Split multi-document PDFs into separate files
API Reference
Full API reference with all endpoints and parameters

