Claude Integration

Overview

This page provides ready-to-use Anthropic tool-use JSON schemas that let Claude interact with Unsiloed’s document processing API. Copy the tool definitions into your messages.create() call and Claude can parse, extract, classify, and split documents on your behalf.

All Unsiloed API operations are asynchronous. Each tool submits a job and returns a job_id. Use the unsiloed_get_job_result tool to poll for results. Since Claude cannot upload binary files, all tools accept a publicly accessible URL to the document.

Prerequisites

Before using these tool schemas, you’ll need:

An Unsiloed API key — sign up at unsiloed.ai to get one
An Anthropic API key — get one from the Anthropic console
Python 3.8+ or Node.js 18+

pip install anthropic requests

Key Features

Drop-In Tool Schemas

Copy-paste JSON tool definitions directly into your Anthropic API calls

Async Job Handling

Built-in polling tool to retrieve results from asynchronous operations

Core Document Operations

Parse documents, extract structured data, classify, and split PDFs

Agentic Loop Ready

Full working example of an autonomous document processing agent

Tool Definitions

Each tool follows Anthropic’s tool format with name, description, and input_schema. Expand each tool below to see its full schema.

unsiloed_parse_document

Parse documents (PDF, images, Office files) into structured chunks with element detection, OCR, and reading order analysis.

{
  "name": "unsiloed_parse_document",
  "description": "Parse a document into structured chunks with element detection, text extraction, and reading order analysis. Use this tool when the user wants to break a document into its structural components (text blocks, tables, images, headers, footers) or convert a document to markdown. You must provide a publicly accessible URL to the document. Supported formats include PDF, PNG, JPEG, TIFF, BMP, DOCX, XLSX, and PPTX. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the parsed content.",
  "input_schema": {
    "type": "object",
    "properties": {
      "url": {
        "type": "string",
        "description": "Publicly accessible URL to the document to parse."
      },
      "use_high_resolution": {
        "type": "boolean",
        "description": "Use high-resolution image processing for better OCR accuracy. Defaults to false."
      },
      "segmentation_method": {
        "type": "string",
        "enum": ["smart_layout_detection", "page_by_page"],
        "description": "Document segmentation strategy. 'smart_layout_detection' (default) groups related elements into semantic chunks. 'page_by_page' creates one chunk per page."
      },
      "ocr_mode": {
        "type": "string",
        "enum": ["auto_ocr", "full_ocr"],
        "description": "OCR strategy. 'auto_ocr' (default) only runs OCR when needed. 'full_ocr' forces OCR on all pages."
      },
      "merge_tables": {
        "type": "boolean",
        "description": "Merge adjacent table segments into a single table. Defaults to false."
      },
      "segment_filter": {
        "type": "string",
        "description": "Filter output to include only specific segment types. Accepts comma-separated values (e.g., 'table', 'picture', 'table,picture') or 'all' for everything. Defaults to 'all'."
      },
      "xml_citation": {
        "type": "boolean",
        "description": "Enable citation extraction from PDF documents. Extracts structured bibliography and in-text citations in markdown output. Defaults to false."
      },
      "output_fields": {
        "type": "string",
        "description": "JSON object controlling which fields are included in the response. Set fields to false to exclude them and reduce response size. Available fields: html, markdown, ocr, image, content, bbox, confidence, embed. All default to true. Example: {\"html\":true,\"markdown\":true,\"ocr\":false,\"image\":false}"
      },
      "segment_analysis": {
        "type": "string",
        "description": "JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Example: {\"Table\":{\"html\":\"VLM\",\"markdown\":\"VLM\"},\"Picture\":{\"html\":\"VLM\",\"markdown\":\"VLM\"}}"
      },
      "page_range": {
        "type": "string",
        "description": "Specify which pages to process. Formats: '1-5', '2,4,6', or '[1,3,5]'. Defaults to all pages."
      },
      "segment_type_naming": {
        "type": "string",
        "enum": ["Unsiloed", "Other"],
        "description": "Segment type naming convention. 'Unsiloed' (default) uses names like PageHeader, ListItem, Picture. 'Other' uses alternative names like Header, List Item, Figure."
      },
      "detect_checkboxes": {
        "type": "boolean",
        "description": "Detect and identify checkboxes in the document with their bounding box locations. Defaults to false."
      },
      "extract_charts": {
        "type": "boolean",
        "description": "Extract structured data from charts and graphs, including data points and chart type information. Defaults to false."
      },
      "extract_colors": {
        "type": "boolean",
        "description": "Transfer text color from the PDF text layer to OCR results. Defaults to false."
      },
      "extract_links": {
        "type": "boolean",
        "description": "Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false."
      },
      "export_format": {
        "type": "string",
        "description": "JSON array of export format(s) to generate after parsing. The exported files are available as presigned URLs in the exports field of the response. Currently supported: [\"docx\"]. Example: [\"docx\"]"
      },
      "error_handling": {
        "type": "string",
        "enum": ["Continue", "Fail"],
        "description": "How to handle per-page errors. 'Continue' (default) skips failed pages and continues. 'Fail' aborts the entire job on the first error."
      },
      "expires_in": {
        "type": "integer",
        "description": "Seconds until the task and its output are automatically deleted."
      },
      "chunk_processing": {
        "type": "string",
        "description": "JSON object for chunk processing configuration."
      },
      "llm_processing": {
        "type": "string",
        "description": "JSON object for LLM processing configuration."
      }
    },
    "required": ["url"]
  }
}

unsiloed_extract_data

Extract structured data from PDF documents using a custom JSON schema you define.

{
  "name": "unsiloed_extract_data",
  "description": "Extract structured data from a PDF document using a custom JSON schema. Use this tool when the user wants to pull specific fields (such as invoice numbers, names, dates, or line items) out of a PDF. You must provide a publicly accessible URL to the PDF and a JSON schema defining the fields to extract. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the extracted data. Do not use this tool for document classification or splitting.",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_url": {
        "type": "string",
        "description": "Publicly accessible URL to the PDF file to process."
      },
      "schema_data": {
        "type": "string",
        "description": "A JSON-stringified schema defining the fields to extract. Example: {\"type\":\"object\",\"properties\":{\"invoice_number\":{\"type\":\"string\",\"description\":\"The invoice number\"},\"total\":{\"type\":\"number\",\"description\":\"Total amount\"}},\"required\":[\"invoice_number\"],\"additionalProperties\":false}"
      },
      "model": {
        "type": "string",
        "enum": ["alpha", "beta", "gamma", "delta"],
        "description": "Model tier for extraction. Default is 'gamma', recommended for most use cases."
      },
      "enable_citations": {
        "type": "boolean",
        "description": "When true, returns bounding box coordinates for each extracted value, enabling you to trace data back to its location in the PDF. Defaults to false."
      }
    },
    "required": ["file_url", "schema_data"]
  }
}

unsiloed_classify_document

Classify a PDF document into one of several predefined categories with confidence scoring.

{
  "name": "unsiloed_classify_document",
  "description": "Classify a PDF document into one of several predefined categories. Use this tool when the user wants to determine what type of document a PDF is (for example, invoice, receipt, contract, or medical record). You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the classification result with confidence scores. Do not use this for data extraction or document splitting.",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_url": {
        "type": "string",
        "description": "Publicly accessible URL to the PDF file to classify."
      },
      "categories": {
        "type": "string",
        "description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices with itemized charges\"},{\"name\":\"Receipt\"},{\"name\":\"Contract\",\"description\":\"Legal agreements\"}]"
      }
    },
    "required": ["file_url", "categories"]
  }
}

unsiloed_split_document

Split a multi-document PDF into separate files by classifying each page into categories.

{
  "name": "unsiloed_split_document",
  "description": "Split a multi-document PDF into separate files by classifying each page into predefined categories. Use this tool when the user has a single PDF containing multiple document types (for example, a scanned batch of invoices, receipts, and contracts) and wants them separated into individual files. You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve download links for the split files. Do not use this for single-document classification or data extraction.",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_url": {
        "type": "string",
        "description": "Publicly accessible URL to the PDF file to split."
      },
      "categories": {
        "type": "string",
        "description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices\"},{\"name\":\"Contract\"},{\"name\":\"Receipt\"}]"
      }
    },
    "required": ["file_url", "categories"]
  }
}

unsiloed_get_job_result

Poll for the result of any asynchronous Unsiloed job.

{
  "name": "unsiloed_get_job_result",
  "description": "Poll for the result of an asynchronous Unsiloed job. Use this tool after calling unsiloed_parse_document, unsiloed_extract_data, unsiloed_classify_document, or unsiloed_split_document to check whether the job has completed and retrieve its results. If the status indicates the job is still processing, wait a few seconds and call this tool again. Once the job is complete, the response contains the output data. If the job failed, the response includes an error message explaining what went wrong.",
  "input_schema": {
    "type": "object",
    "properties": {
      "job_id": {
        "type": "string",
        "description": "The job_id returned by a previous Unsiloed tool call."
      },
      "job_type": {
        "type": "string",
        "enum": ["parse", "extract", "classify", "splitter"],
        "description": "The type of job to check. Use 'parse' for parsing jobs, 'extract' for extraction jobs, 'classify' for classification jobs, and 'splitter' for splitting jobs."
      }
    },
    "required": ["job_id", "job_type"]
  }
}

All Tools (Copy & Paste)

Copy this complete tools array and pass it directly to your Anthropic API call:

[
  {
    "name": "unsiloed_parse_document",
    "description": "Parse a document into structured chunks with element detection, text extraction, and reading order analysis. Use this tool when the user wants to break a document into its structural components (text blocks, tables, images, headers, footers) or convert a document to markdown. You must provide a publicly accessible URL to the document. Supported formats include PDF, PNG, JPEG, TIFF, BMP, DOCX, XLSX, and PPTX. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the parsed content.",
    "input_schema": {
      "type": "object",
      "properties": {
        "url": {
          "type": "string",
          "description": "Publicly accessible URL to the document to parse."
        },
        "use_high_resolution": {
          "type": "boolean",
          "description": "Use high-resolution image processing for better OCR accuracy. Defaults to false."
        },
        "segmentation_method": {
          "type": "string",
          "enum": ["smart_layout_detection", "page_by_page"],
          "description": "Document segmentation strategy. 'smart_layout_detection' (default) groups related elements into semantic chunks. 'page_by_page' creates one chunk per page."
        },
        "ocr_mode": {
          "type": "string",
          "enum": ["auto_ocr", "full_ocr"],
          "description": "OCR strategy. 'auto_ocr' (default) only runs OCR when needed. 'full_ocr' forces OCR on all pages."
        },
        "merge_tables": {
          "type": "boolean",
          "description": "Merge adjacent table segments into a single table. Defaults to false."
        },
        "segment_filter": {
          "type": "string",
          "description": "Filter output to include only specific segment types. Accepts comma-separated values (e.g., 'table', 'picture', 'table,picture') or 'all' for everything. Defaults to 'all'."
        },
        "xml_citation": {
          "type": "boolean",
          "description": "Enable citation extraction from PDF documents. Extracts structured bibliography and in-text citations in markdown output. Defaults to false."
        },
        "output_fields": {
          "type": "string",
          "description": "JSON object controlling which fields are included in the response. Set fields to false to exclude them and reduce response size. Available fields: html, markdown, ocr, image, content, bbox, confidence, embed. All default to true. Example: {\"html\":true,\"markdown\":true,\"ocr\":false,\"image\":false}"
        },
        "segment_analysis": {
          "type": "string",
          "description": "JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Example: {\"Table\":{\"html\":\"VLM\",\"markdown\":\"VLM\"},\"Picture\":{\"html\":\"VLM\",\"markdown\":\"VLM\"}}"
        },
        "page_range": {
          "type": "string",
          "description": "Specify which pages to process. Formats: '1-5', '2,4,6', or '[1,3,5]'. Defaults to all pages."
        },
        "segment_type_naming": {
          "type": "string",
          "enum": ["Unsiloed", "Other"],
          "description": "Segment type naming convention. 'Unsiloed' (default) uses names like PageHeader, ListItem, Picture. 'Other' uses alternative names like Header, List Item, Figure."
        },
        "detect_checkboxes": {
          "type": "boolean",
          "description": "Detect and identify checkboxes in the document with their bounding box locations. Defaults to false."
        },
        "extract_charts": {
          "type": "boolean",
          "description": "Extract structured data from charts and graphs, including data points and chart type information. Defaults to false."
        },
        "extract_colors": {
          "type": "boolean",
          "description": "Transfer text color from the PDF text layer to OCR results. Defaults to false."
        },
        "extract_links": {
          "type": "boolean",
          "description": "Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false."
        },
        "export_format": {
          "type": "string",
          "description": "JSON array of export format(s) to generate after parsing. The exported files are available as presigned URLs in the exports field of the response. Currently supported: [\"docx\"]. Example: [\"docx\"]"
        },
        "error_handling": {
          "type": "string",
          "enum": ["Continue", "Fail"],
          "description": "How to handle per-page errors. 'Continue' (default) skips failed pages and continues. 'Fail' aborts the entire job on the first error."
        },
        "expires_in": {
          "type": "integer",
          "description": "Seconds until the task and its output are automatically deleted."
        },
        "chunk_processing": {
          "type": "string",
          "description": "JSON object for chunk processing configuration."
        },
        "llm_processing": {
          "type": "string",
          "description": "JSON object for LLM processing configuration."
        }
      },
      "required": ["url"]
    }
  },
  {
    "name": "unsiloed_extract_data",
    "description": "Extract structured data from a PDF document using a custom JSON schema. Use this tool when the user wants to pull specific fields (such as invoice numbers, names, dates, or line items) out of a PDF. You must provide a publicly accessible URL to the PDF and a JSON schema defining the fields to extract. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the extracted data. Do not use this tool for document classification or splitting.",
    "input_schema": {
      "type": "object",
      "properties": {
        "file_url": {
          "type": "string",
          "description": "Publicly accessible URL to the PDF file to process."
        },
        "schema_data": {
          "type": "string",
          "description": "A JSON-stringified schema defining the fields to extract. Example: {\"type\":\"object\",\"properties\":{\"invoice_number\":{\"type\":\"string\",\"description\":\"The invoice number\"},\"total\":{\"type\":\"number\",\"description\":\"Total amount\"}},\"required\":[\"invoice_number\"],\"additionalProperties\":false}"
        },
        "model": {
          "type": "string",
          "enum": ["alpha", "beta", "gamma", "delta"],
          "description": "Model tier for extraction. Default is 'gamma', recommended for most use cases."
        },
        "enable_citations": {
          "type": "boolean",
          "description": "When true, returns bounding box coordinates for each extracted value, enabling you to trace data back to its location in the PDF. Defaults to false."
        }
      },
      "required": ["file_url", "schema_data"]
    }
  },
  {
    "name": "unsiloed_classify_document",
    "description": "Classify a PDF document into one of several predefined categories. Use this tool when the user wants to determine what type of document a PDF is (for example, invoice, receipt, contract, or medical record). You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve the classification result with confidence scores. Do not use this for data extraction or document splitting.",
    "input_schema": {
      "type": "object",
      "properties": {
        "file_url": {
          "type": "string",
          "description": "Publicly accessible URL to the PDF file to classify."
        },
        "categories": {
          "type": "string",
          "description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices with itemized charges\"},{\"name\":\"Receipt\"},{\"name\":\"Contract\",\"description\":\"Legal agreements\"}]"
        }
      },
      "required": ["file_url", "categories"]
    }
  },
  {
    "name": "unsiloed_split_document",
    "description": "Split a multi-document PDF into separate files by classifying each page into predefined categories. Use this tool when the user has a single PDF containing multiple document types (for example, a scanned batch of invoices, receipts, and contracts) and wants them separated into individual files. You must provide a publicly accessible URL to the PDF and a list of candidate categories. The operation is asynchronous — it returns a job_id that you must poll using unsiloed_get_job_result to retrieve download links for the split files. Do not use this for single-document classification or data extraction.",
    "input_schema": {
      "type": "object",
      "properties": {
        "file_url": {
          "type": "string",
          "description": "Publicly accessible URL to the PDF file to split."
        },
        "categories": {
          "type": "string",
          "description": "A JSON-stringified array of category objects. Each object must have a 'name' field and may have an optional 'description' field for better accuracy. Example: [{\"name\":\"Invoice\",\"description\":\"Financial invoices\"},{\"name\":\"Contract\"},{\"name\":\"Receipt\"}]"
        }
      },
      "required": ["file_url", "categories"]
    }
  },
  {
    "name": "unsiloed_get_job_result",
    "description": "Poll for the result of an asynchronous Unsiloed job. Use this tool after calling unsiloed_parse_document, unsiloed_extract_data, unsiloed_classify_document, or unsiloed_split_document to check whether the job has completed and retrieve its results. If the status indicates the job is still processing, wait a few seconds and call this tool again. Once the job is complete, the response contains the output data. If the job failed, the response includes an error message explaining what went wrong.",
    "input_schema": {
      "type": "object",
      "properties": {
        "job_id": {
          "type": "string",
          "description": "The job_id returned by a previous Unsiloed tool call."
        },
        "job_type": {
          "type": "string",
          "enum": ["parse", "extract", "classify", "splitter"],
          "description": "The type of job to check. Use 'parse' for parsing jobs, 'extract' for extraction jobs, 'classify' for classification jobs, and 'splitter' for splitting jobs."
        }
      },
      "required": ["job_id", "job_type"]
    }
  }
]

Usage with the Anthropic SDK

Here’s how to register these tools and handle Claude’s tool calls:

import anthropic
import requests
import json
import time

UNSILOED_API_KEY = "your-unsiloed-api-key"
UNSILOED_BASE_URL = "https://prod.visionapi.unsiloed.ai"
UNSILOED_HEADERS = {"api-key": UNSILOED_API_KEY}

# Load the tools array from the copy-paste block above
tools = [...]  # Paste the full tools array here

client = anthropic.Anthropic(api_key="your-anthropic-api-key")


def process_tool_call(tool_name: str, tool_input: dict) -> str:
    """Execute an Unsiloed API tool call and return the result as a string."""

    if tool_name == "unsiloed_parse_document":
        data = {"url": tool_input["url"]}
        if "use_high_resolution" in tool_input:
            data["use_high_resolution"] = str(tool_input["use_high_resolution"]).lower()
        if "segmentation_method" in tool_input:
            data["segmentation_method"] = tool_input["segmentation_method"]
        if "ocr_mode" in tool_input:
            data["ocr_mode"] = tool_input["ocr_mode"]
        if "merge_tables" in tool_input:
            data["merge_tables"] = str(tool_input["merge_tables"]).lower()
        if "segment_filter" in tool_input:
            data["segment_filter"] = tool_input["segment_filter"]
        if "xml_citation" in tool_input:
            data["xml_citation"] = str(tool_input["xml_citation"]).lower()
        resp = requests.post(f"{UNSILOED_BASE_URL}/parse", headers=UNSILOED_HEADERS, data=data)

    elif tool_name == "unsiloed_extract_data":
        data = {
            "file_url": tool_input["file_url"],
            "schema_data": tool_input["schema_data"],
        }
        if "model" in tool_input:
            data["model"] = tool_input["model"]
        if "enable_citations" in tool_input:
            data["enable_citations"] = str(tool_input["enable_citations"]).lower()
        resp = requests.post(f"{UNSILOED_BASE_URL}/v2/extract", headers=UNSILOED_HEADERS, data=data)

    elif tool_name == "unsiloed_classify_document":
        data = {
            "file_url": tool_input["file_url"],
            "categories": tool_input["categories"],
        }
        resp = requests.post(f"{UNSILOED_BASE_URL}/classify", headers=UNSILOED_HEADERS, data=data)

    elif tool_name == "unsiloed_split_document":
        data = {
            "file_url": tool_input["file_url"],
            "categories": tool_input["categories"],
        }
        resp = requests.post(f"{UNSILOED_BASE_URL}/splitter", headers=UNSILOED_HEADERS, data=data)

    elif tool_name == "unsiloed_get_job_result":
        job_type = tool_input["job_type"]
        job_id = tool_input["job_id"]
        time.sleep(5)  # Brief pause before polling
        resp = requests.get(f"{UNSILOED_BASE_URL}/{job_type}/{job_id}", headers=UNSILOED_HEADERS)

    else:
        return json.dumps({"error": f"Unknown tool: {tool_name}"})

    return json.dumps(resp.json())

Complete Agentic Loop Example

This standalone example shows a full autonomous loop where Claude processes a document end-to-end:

Python

import anthropic
import requests
import json
import time

# --- Configuration ---
ANTHROPIC_API_KEY = "your-anthropic-api-key"
UNSILOED_API_KEY = "your-unsiloed-api-key"
UNSILOED_BASE_URL = "https://prod.visionapi.unsiloed.ai"
UNSILOED_HEADERS = {"api-key": UNSILOED_API_KEY}

# --- Load tools (paste the full tools array from above) ---
tools = [...]  # Paste the complete tools JSON array here

# --- Tool executor ---
def process_tool_call(tool_name: str, tool_input: dict) -> str:
    if tool_name == "unsiloed_extract_data":
        resp = requests.post(
            f"{UNSILOED_BASE_URL}/v2/extract",
            headers=UNSILOED_HEADERS,
            data={
                "file_url": tool_input["file_url"],
                "schema_data": tool_input["schema_data"],
                **({"model": tool_input["model"]} if "model" in tool_input else {}),
            },
        )
    elif tool_name == "unsiloed_get_job_result":
        time.sleep(5)
        resp = requests.get(
            f"{UNSILOED_BASE_URL}/{tool_input['job_type']}/{tool_input['job_id']}",
            headers=UNSILOED_HEADERS,
        )
    # Add other tool handlers (parse, classify, split) as needed
    else:
        return json.dumps({"error": f"Unhandled tool: {tool_name}"})
    return json.dumps(resp.json())

# --- Agentic loop ---
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

messages = [
    {
        "role": "user",
        "content": "Extract the invoice number, date, and total amount from this PDF: https://example.com/invoice.pdf",
    }
]

print("Starting agentic loop...\n")

while True:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        tools=tools,
        messages=messages,
    )

    # Collect assistant response
    messages.append({"role": "assistant", "content": response.content})

    # If Claude is done, print the final text and exit
    if response.stop_reason == "end_turn":
        for block in response.content:
            if hasattr(block, "text"):
                print(f"Claude: {block.text}")
        break

    # If Claude wants to use tools, execute them
    if response.stop_reason == "tool_use":
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                print(f"  -> Calling {block.name}({json.dumps(block.input, indent=2)[:100]}...)")
                result = process_tool_call(block.name, block.input)
                tool_results.append(
                    {
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    }
                )
        messages.append({"role": "user", "content": tool_results})

print("\nDone.")

Error Handling

When integrating with the Unsiloed API, handle these common scenarios:

Job failure

If a job’s status is "Failed" or "failed", the response includes an error message. Parse jobs use capitalized statuses (Succeeded, Failed), while extraction, classification, and splitting jobs use lowercase (completed, failed).

{
  "status": "failed",
  "error": "Error processing document: unsupported file format"
}

Authentication errors

A 401 response means the API key is missing or invalid. Ensure you pass the key in the api-key header.

{
  "detail": "API key is required. Please provide 'api-key' in the request header."
}

Quota exceeded

A 402 response means your organization has run out of credits. Check the quota_remaining field in successful responses to monitor usage proactively.

{
  "detail": {
    "message": "Insufficient quota",
    "status": "QUOTA_EXCEEDED"
  }
}

Polling timeout

If a job hasn’t completed after 5 minutes of polling, treat it as a timeout. Jobs rarely take longer than 2 minutes for standard documents.

Best Practices

Use descriptive categories — When classifying or splitting, add description fields to your category objects. This significantly improves accuracy.
Poll with backoff — Wait 5-10 seconds between unsiloed_get_job_result calls. Tight polling wastes quota and adds no benefit.
Use publicly accessible URLs — Claude cannot upload binary files. Use presigned URLs from your cloud storage or any publicly accessible link.
Keep extraction schemas focused — Smaller, targeted extraction schemas produce better results than large catch-all schemas. Extract what you need.
Handle errors gracefully — Always check job status before processing results. Return clear error messages so Claude can inform the user.
Monitor your quota — Check the quota_remaining field in API responses and alert when running low.

Next Steps

Parsing

Learn about document parsing and structure analysis

Classification

Explore document classification with confidence scoring

Splitting

Split multi-document PDFs into separate files

API Reference

Full API reference with all endpoints and parameters

Getting Started

Document Processing

FAQ

Overview

Prerequisites

Key Features

Drop-In Tool Schemas

Async Job Handling

Core Document Operations

Agentic Loop Ready

Tool Definitions

All Tools (Copy & Paste)

Usage with the Anthropic SDK

Complete Agentic Loop Example

Error Handling

Best Practices

Next Steps

Parsing

Classification

Splitting

API Reference

Getting Started

Document Processing

FAQ

​Overview

​Prerequisites

​Key Features

Drop-In Tool Schemas

Async Job Handling

Core Document Operations

Agentic Loop Ready

​Tool Definitions

​All Tools (Copy & Paste)

​Usage with the Anthropic SDK

​Complete Agentic Loop Example

​Error Handling

​Best Practices

​Next Steps

Parsing

Classification

Splitting

API Reference

Overview

Prerequisites

Key Features

Tool Definitions

All Tools (Copy & Paste)

Usage with the Anthropic SDK

Complete Agentic Loop Example

Error Handling

Best Practices

Next Steps