Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt

Use this file to discover all available pages before exploring further.

We’ll walk through extracting structured fields from a single document end to end. For parsing documents into Markdown chunks instead, see the Parse Quickstart.

Before You Start

  • Get an Unsiloed AI API key by signing up.
  • Have a document to extract fields from (PDF, DOCX, PPTX, JPG, PNG, etc.).
  • Know what fields you want. See the Schemas reference for the JSON Schema rules.
  • Optionally generate the schema from the Unsiloed dashboard instead of writing it by hand.
Keep your API key out of source control. The examples below read it from the UNSILOED_API_KEY environment variable.

1. Submit a Document With a Schema

The /v2/extract endpoint accepts a multipart upload with two fields: pdf_file (the document) and schema_data (the JSON schema as a string). It returns a job_id you can poll for results. All requests go to https://prod.visionapi.unsiloed.ai with your key in the api-key header.
The JavaScript example uses ES modules (top-level await, import). Save it as script.mjs or add "type": "module" to your package.json. You’ll also need Node.js 18 or newer, which exposes fetch, FormData, and Blob as globals.
import os
import json
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {
            "type": "string",
            "description": "Invoice number"
        },
        "invoice_date": {
            "type": "string",
            "description": "Date the invoice was issued"
        },
        "vendor_name": {
            "type": "string",
            "description": "Legal name of the company issuing the invoice"
        },
        "customer_name": {
            "type": "string",
            "description": "Name of the customer being billed"
        },
        "line_items": {
            "type": "array",
            "description": "Each row of services billed on the invoice",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string", "description": "Description of the service"},
                    "hours": {"type": "number", "description": "Hours billed"},
                    "rate": {"type": "number", "description": "Hourly rate in USD"},
                    "amount": {"type": "number", "description": "Line total in USD"},
                },
                "required": ["description", "amount"],
                "additionalProperties": False,
            },
        },
        "total_amount_due": {
            "type": "number",
            "description": "Final total amount due in USD"
        },
    },
    "required": ["invoice_number", "invoice_date", "vendor_name", "total_amount_due"],
    "additionalProperties": False,
}

with open("invoice.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/v2/extract",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("invoice.pdf", f, "application/pdf")},
        data={"schema_data": json.dumps(schema)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")
The response contains the job_id you need for the next step:
{
  "job_id": "9316e199-82bb-4111-b559-91d81e41cc4f",
  "status": "processing"
}

2. Poll for Results

Polling GET /extract/{job_id} returns the job’s current state. A status of completed indicates the result is ready, failed indicates the job errored, and any other value (such as processing) means the job is still running.
import time

while True:
    result = requests.get(
        f"{BASE_URL}/extract/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "extract job failed"))
    time.sleep(5)

for field_name, field_data in result["result"].items():
    print(f"{field_name}: {field_data['value']} (score={field_data['score']:.2%})")

Response Shape

A completed extraction response contains job metadata plus a result object with one entry per schema field. Each entry has the extracted value and a score between 0 and 1. For array fields, the array itself has a score, and each property inside the array’s objects carries its own as well.
{
  "job_id": "283c77a1-ae1b-4b96-b89e-1c5e63fd89aa",
  "status": "completed",
  "file_name": "invoice.pdf",
  "file_url": "https://platform-unsiloed-ai.s3.amazonaws.com/...",
  "created_at": "2026-05-25T15:38:11.821Z",
  "updated_at": "2026-05-25T15:38:42.196Z",
  "metadata": {
    "page_count": 1,
    "order": ["invoice_number", "invoice_date", "vendor_name", "customer_name", "line_items", "total_amount_due"],
    "schema": { "...": "..." }
  },
  "result": {
    "invoice_number": {
      "value": "NC-2025-00417",
      "score": 0.97
    },
    "invoice_date": {
      "value": "March 15, 2025",
      "score": 0.98
    },
    "vendor_name": {
      "value": "NORTHWIND CONSULTING LLC",
      "score": 0.95
    },
    "customer_name": {
      "value": "Acme Industries, Inc.",
      "score": 0.95
    },
    "total_amount_due": {
      "value": 12586.0,
      "score": 0.97
    },
    "line_items": {
      "score": 0.99,
      "value": [
        {
          "description": { "value": "Strategic planning workshop facilitation", "score": 0.98 },
          "hours":  { "value": 12,     "score": 0.97 },
          "rate":   { "value": 275.0,  "score": 0.97 },
          "amount": { "value": 3300.0, "score": 0.98 }
        },
        "...three more rows..."
      ]
    }
  }
}
See the Response Format reference for the full field-by-field breakdown.

Next Steps

Schemas

JSON Schema rules and worked examples for invoices and SEC filings.

Response Format

The canonical extraction response shape with a field-by-field reference.

API Reference

Browse the full request and response specs for /v2/extract.

FAQ

Check limits, supported formats, and answers to common questions.