Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt

Use this file to discover all available pages before exploring further.

Every extraction schema follows JSON Schema with strict-mode rules. These rules apply at every level (the root, every nested object, and every array’s items definition) and keep the output deterministic and well-typed.
Prefer not to write JSON by hand? The Unsiloed dashboard has a schema builder with Manual and Auto-Suggest modes. In Auto-Suggest, describe the fields you want, upload an example document, and the dashboard generates a schema you can export and pass to /v2/extract.

Core Requirements

1. Root Object Every schema starts with "type": "object". Arrays and primitives aren’t allowed at the top level.
{
  "type": "object",
  "properties": {
    // Define your fields here
  },
  "required": [...],
  "additionalProperties": false
}
2. Properties Define all fields you want to extract using the "properties" key. Each field must specify a "type" and should include a clear "description".
{
  "type": "object",
  "properties": {
    "field_name_1": {
      "type": "string",
      "description": "Clear description of what to extract"
    },
    "field_name_2": {
      "type": "number",
      "description": "Description with units or context"
    }
  },
  "required": [...],
  "additionalProperties": false
}
3. Required Fields Specify mandatory fields using the "required" array. Field names must exactly match those defined in "properties".
{
  "type": "object",
  "properties": {
    "mandatory_field": { "type": "string", "description": "This field is required" },
    "another_required_field": { "type": "string", "description": "This is also required" }
  },
  "required": ["mandatory_field", "another_required_field"],
  "additionalProperties": false
}
4. Additional Properties Always set "additionalProperties": false at every object level to ensure only specified fields appear in output.
{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "field_name": { "type": "string" }
        },
        "required": [...],
        "additionalProperties": false  // Required in array items
      }
    }
  },
  "required": [...],
  "additionalProperties": false  // Required at root level
}

Supported Types

Extraction schemas support four field types: String: For text, dates, IDs, names, addresses, and any textual data
{
  "field_name": {
    "type": "string",
    "description": "Description of the text field"
  }
}
Number: For integers and decimals like prices, quantities, counts, and measurements
{
  "field_name": {
    "type": "number",
    "description": "Description with units (e.g., USD, kg)"
  }
}
Boolean: For true/false values such as status flags and yes/no fields
{
  "field_name": {
    "type": "boolean",
    "description": "Description of the boolean condition"
  }
}
Array: For repeating items like line items or lists. Must include items to define the structure of array elements
{
  "field_name": {
    "type": "array",
    "description": "Description of the array items",
    "items": {
      "type": "object",
      "properties": {
        "item_field1": { "type": "string", "description": "..." },
        "item_field2": { "type": "string", "description": "..." }
      },
      "required": [...],
      "additionalProperties": false
    }
  }
}

Building Schemas

Primitive Types

Primitive fields use string, number, or boolean as their type. These are the building blocks of your schema.
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Invoice number"
    },
    "total_amount": {
      "type": "number",
      "description": "Total amount in USD"
    },
    "is_paid": {
      "type": "boolean",
      "description": "Payment status"
    }
  },
  "required": ["invoice_number", "total_amount", "is_paid"],
  "additionalProperties": false
}

Arrays of Objects

Use arrays when you have repeating data like line items, transactions, or people.
{
  "type": "object",
  "properties": {
    "line_items": {
      "type": "array",
      "description": "Invoice line items",
      "items": {
        "type": "object",
        "properties": {
          "description": {
            "type": "string",
            "description": "Item description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity"
          },
          "price": {
            "type": "number",
            "description": "Unit price"
          }
        },
        "required": ["description", "quantity", "price"],
        "additionalProperties": false
      }
    }
  },
  "additionalProperties": false
}

Nested Arrays

For hierarchical data, nest an array inside the objects of another array.
{
  "type": "object",
  "properties": {
    "orders": {
      "type": "array",
      "description": "Customer orders",
      "items": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "description": "Order ID"
          },
          "shipments": {
            "type": "array",
            "description": "Shipments for this order",
            "items": {
              "type": "object",
              "properties": {
                "tracking_number": {
                  "type": "string",
                  "description": "Tracking number"
                },
                "carrier": {
                  "type": "string",
                  "description": "Shipping carrier"
                }
              },
              "required": ["tracking_number", "carrier"],
              "additionalProperties": false
            }
          }
        },
        "required": ["order_id", "shipments"],
        "additionalProperties": false
      }
    }
  },
  "additionalProperties": false
}
A common schema for extracting data from US invoices.
{
  "type": "object",
  "properties": {
    "document_type": {
      "type": "string",
      "description": "Type of document (invoice, receipt, etc.)"
    },
    "invoice_header": {
      "type": "object",
      "description": "Invoice header details",
      "properties": {
        "invoice_number": {
          "type": "string",
          "description": "Invoice number"
        },
        "invoice_date": {
          "type": "string",
          "description": "Invoice issue date"
        },
        "due_date": {
          "type": "string",
          "description": "Payment due date"
        }
      },
      "required": ["invoice_number", "invoice_date"],
      "additionalProperties": false
    },
    "vendor": {
      "type": "object",
      "description": "Vendor information",
      "properties": {
        "vendor_name": {
          "type": "string",
          "description": "Legal business name"
        },
        "vendor_address": {
          "type": "string",
          "description": "Business address"
        },
        "vendor_email": {
          "type": "string",
          "description": "Accounts receivable contact email"
        }
      },
      "required": ["vendor_name"],
      "additionalProperties": false
    },
    "line_items": {
      "type": "array",
      "description": "List of billed items",
      "items": {
        "type": "object",
        "properties": {
          "description": {
            "type": "string",
            "description": "Item or service description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity billed"
          },
          "unit_price": {
            "type": "number",
            "description": "Price per unit in USD"
          },
          "line_total": {
            "type": "number",
            "description": "Total cost for this line item"
          }
        },
        "required": ["description", "quantity", "unit_price"],
        "additionalProperties": false
      }
    },
    "invoice_totals": {
      "type": "object",
      "description": "Invoice totals",
      "properties": {
        "subtotal": {
          "type": "number",
          "description": "Subtotal before tax"
        },
        "sales_tax": {
          "type": "number",
          "description": "Sales tax amount"
        },
        "total_amount_due": {
          "type": "number",
          "description": "Final amount due in USD"
        }
      },
      "required": ["total_amount_due"],
      "additionalProperties": false
    }
  },
  "required": ["document_type", "invoice_header", "invoice_totals"],
  "additionalProperties": false
}
A schema for extracting governance and ownership information from US SEC filings.
{
  "type": "object",
  "properties": {
    "board_of_directors": {
      "type": "array",
      "description": "Board of Directors",
      "items": {
        "type": "object",
        "properties": {
          "director_name": {
            "type": "string",
            "description": "Full name of board member"
          }
        },
        "required": ["director_name"],
        "additionalProperties": false
      }
    },
    "major_shareholders": {
      "type": "array",
      "description": "Major shareholders and ownership",
      "items": {
        "type": "object",
        "properties": {
          "shareholder_name": {
            "type": "string",
            "description": "Name of shareholder"
          },
          "ownership_percentage": {
            "type": "string",
            "description": "Percentage ownership"
          }
        },
        "required": ["shareholder_name", "ownership_percentage"],
        "additionalProperties": false
      }
    }
  },
  "required": ["board_of_directors", "major_shareholders"],
  "additionalProperties": false
}