Getting Started With Extract

We’ll walk through extracting structured fields from a single document end to end. For parsing documents into Markdown chunks instead, see the Parse Quickstart.

Before You Start

Get an Unsiloed AI API key by signing up.
Have a document to extract fields from (PDF, DOCX, PPTX, JPG, PNG, etc.).
Know what fields you want. See the Schemas reference for the JSON Schema rules.
Optionally generate the schema from the Unsiloed dashboard instead of writing it by hand.

Keep your API key out of source control. The examples below read it from the UNSILOED_API_KEY environment variable.

1. Submit a Document With a Schema

The /v2/extract endpoint accepts a multipart upload with two fields: pdf_file (the document) and schema_data (the JSON schema as a string). It returns a job_id you can poll for results. All requests go to https://prod.visionapi.unsiloed.ai with your key in the api-key header.

The JavaScript example uses ES modules (top-level await, import). Save it as script.mjs or add "type": "module" to your package.json. You’ll also need Node.js 18 or newer, which exposes fetch, FormData, and Blob as globals.

import os
import json
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {
            "type": "string",
            "description": "Invoice number"
        },
        "invoice_date": {
            "type": "string",
            "description": "Date the invoice was issued"
        },
        "vendor_name": {
            "type": "string",
            "description": "Legal name of the company issuing the invoice"
        },
        "customer_name": {
            "type": "string",
            "description": "Name of the customer being billed"
        },
        "line_items": {
            "type": "array",
            "description": "Each row of services billed on the invoice",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string", "description": "Description of the service"},
                    "hours": {"type": "number", "description": "Hours billed"},
                    "rate": {"type": "number", "description": "Hourly rate in USD"},
                    "amount": {"type": "number", "description": "Line total in USD"},
                },
                "required": ["description", "amount"],
                "additionalProperties": False,
            },
        },
        "total_amount_due": {
            "type": "number",
            "description": "Final total amount due in USD"
        },
    },
    "required": ["invoice_number", "invoice_date", "vendor_name", "total_amount_due"],
    "additionalProperties": False,
}

with open("invoice.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/v2/extract",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("invoice.pdf", f, "application/pdf")},
        data={"schema_data": json.dumps(schema)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

The response contains the job_id you need for the next step:

{
  "job_id": "9316e199-82bb-4111-b559-91d81e41cc4f",
  "status": "processing"
}

2. Poll for Results

Polling GET /extract/{job_id} returns the job’s current state. A status of completed indicates the result is ready, failed indicates the job errored, and any other value (such as processing) means the job is still running.

import time

while True:
    result = requests.get(
        f"{BASE_URL}/extract/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "extract job failed"))
    time.sleep(5)

for field_name, field_data in result["result"].items():
    print(f"{field_name}: {field_data['value']} (score={field_data['score']:.2%})")

Response Shape

A completed extraction response contains job metadata plus a result object with one entry per schema field. Each entry has the extracted value and a score between 0 and 1. For array fields, the array itself has a score, and each property inside the array’s objects carries its own as well.

{
  "job_id": "283c77a1-ae1b-4b96-b89e-1c5e63fd89aa",
  "status": "completed",
  "file_name": "invoice.pdf",
  "file_url": "https://platform-unsiloed-ai.s3.amazonaws.com/...",
  "created_at": "2026-05-25T15:38:11.821Z",
  "updated_at": "2026-05-25T15:38:42.196Z",
  "metadata": {
    "page_count": 1,
    "order": ["invoice_number", "invoice_date", "vendor_name", "customer_name", "line_items", "total_amount_due"],
    "schema": { "...": "..." }
  },
  "result": {
    "invoice_number": {
      "value": "NC-2025-00417",
      "score": 0.97
    },
    "invoice_date": {
      "value": "March 15, 2025",
      "score": 0.98
    },
    "vendor_name": {
      "value": "NORTHWIND CONSULTING LLC",
      "score": 0.95
    },
    "customer_name": {
      "value": "Acme Industries, Inc.",
      "score": 0.95
    },
    "total_amount_due": {
      "value": 12586.0,
      "score": 0.97
    },
    "line_items": {
      "score": 0.99,
      "value": [
        {
          "description": { "value": "Strategic planning workshop facilitation", "score": 0.98 },
          "hours":  { "value": 12,     "score": 0.97 },
          "rate":   { "value": 275.0,  "score": 0.97 },
          "amount": { "value": 3300.0, "score": 0.98 }
        },
        "...three more rows..."
      ]
    }
  }
}

See the Response Format reference for the full field-by-field breakdown.

Next Steps

Schemas

JSON Schema rules and worked examples for invoices and SEC filings.

Response Format

The canonical extraction response shape with a field-by-field reference.

API Reference

Browse the full request and response specs for /v2/extract.

FAQ

Check limits, supported formats, and answers to common questions.

Documentation Index

​Before You Start

​1. Submit a Document With a Schema

​2. Poll for Results

​Response Shape

​Next Steps

Schemas

Response Format

API Reference

FAQ

Before You Start

1. Submit a Document With a Schema

2. Poll for Results

Response Shape

Next Steps