> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Schemas

> JSON Schema rules and patterns for defining what /v2/extract should pull out of a document.

Every extraction schema follows JSON Schema with strict-mode rules. These rules apply at every level (the root, every nested object, and every array's `items` definition) and keep the output deterministic and well-typed.

<Note>
  Prefer not to write JSON by hand? The [Unsiloed dashboard](https://app.unsiloed.ai/playground/Extractor) has a schema builder with Manual and Auto-Suggest modes. In Auto-Suggest, describe the fields you want, upload an example document, and the dashboard generates a schema you can export and pass to `/v2/extract`.
</Note>

## Core Requirements

**1. Root Object**

Every schema starts with `"type": "object"`. Arrays and primitives aren't allowed at the top level.

```json theme={null}
{
  "type": "object",
  "properties": {
    // Define your fields here
  },
  "required": [...],
  "additionalProperties": false
}
```

**2. Properties**

Define all fields you want to extract using the `"properties"` key. Each field must specify a `"type"` and should include a clear `"description"`.

```json theme={null}
{
  "type": "object",
  "properties": {
    "field_name_1": {
      "type": "string",
      "description": "Clear description of what to extract"
    },
    "field_name_2": {
      "type": "number",
      "description": "Description with units or context"
    }
  },
  "required": [...],
  "additionalProperties": false
}
```

**3. Required Fields**

Specify mandatory fields using the `"required"` array. Field names must exactly match those defined in `"properties"`.

```json theme={null}
{
  "type": "object",
  "properties": {
    "mandatory_field": { "type": "string", "description": "This field is required" },
    "another_required_field": { "type": "string", "description": "This is also required" }
  },
  "required": ["mandatory_field", "another_required_field"],
  "additionalProperties": false
}
```

**4. Additional Properties**

Always set `"additionalProperties": false` at every object level to ensure only specified fields appear in output.

```json theme={null}
{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "field_name": { "type": "string" }
        },
        "required": [...],
        "additionalProperties": false  // Required in array items
      }
    }
  },
  "required": [...],
  "additionalProperties": false  // Required at root level
}
```

## Supported Types

Extraction schemas support four field types:

**String:** For text, dates, IDs, names, addresses, and any textual data

```json theme={null}
{
  "field_name": {
    "type": "string",
    "description": "Description of the text field"
  }
}
```

**Number:** For integers and decimals like prices, quantities, counts, and measurements

```json theme={null}
{
  "field_name": {
    "type": "number",
    "description": "Description with units (e.g., USD, kg)"
  }
}
```

**Boolean:** For true/false values such as status flags and yes/no fields

```json theme={null}
{
  "field_name": {
    "type": "boolean",
    "description": "Description of the boolean condition"
  }
}
```

**Array:** For repeating items like line items or lists. Must include `items` to define the structure of array elements

```json theme={null}
{
  "field_name": {
    "type": "array",
    "description": "Description of the array items",
    "items": {
      "type": "object",
      "properties": {
        "item_field1": { "type": "string", "description": "..." },
        "item_field2": { "type": "string", "description": "..." }
      },
      "required": [...],
      "additionalProperties": false
    }
  }
}
```

## Building Schemas

### Primitive Types

Primitive fields use `string`, `number`, or `boolean` as their `type`. These are the building blocks of your schema.

```json theme={null}
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Invoice number"
    },
    "total_amount": {
      "type": "number",
      "description": "Total amount in USD"
    },
    "is_paid": {
      "type": "boolean",
      "description": "Payment status"
    }
  },
  "required": ["invoice_number", "total_amount", "is_paid"],
  "additionalProperties": false
}
```

### Arrays of Objects

Use arrays when you have repeating data like line items, transactions, or people.

```json theme={null}
{
  "type": "object",
  "properties": {
    "line_items": {
      "type": "array",
      "description": "Invoice line items",
      "items": {
        "type": "object",
        "properties": {
          "description": {
            "type": "string",
            "description": "Item description"
          },
          "quantity": {
            "type": "number",
            "description": "Quantity"
          },
          "price": {
            "type": "number",
            "description": "Unit price"
          }
        },
        "required": ["description", "quantity", "price"],
        "additionalProperties": false
      }
    }
  },
  "additionalProperties": false
}
```

### Nested Arrays

For hierarchical data, nest an array inside the objects of another array.

```json theme={null}
{
  "type": "object",
  "properties": {
    "orders": {
      "type": "array",
      "description": "Customer orders",
      "items": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "description": "Order ID"
          },
          "shipments": {
            "type": "array",
            "description": "Shipments for this order",
            "items": {
              "type": "object",
              "properties": {
                "tracking_number": {
                  "type": "string",
                  "description": "Tracking number"
                },
                "carrier": {
                  "type": "string",
                  "description": "Shipping carrier"
                }
              },
              "required": ["tracking_number", "carrier"],
              "additionalProperties": false
            }
          }
        },
        "required": ["order_id", "shipments"],
        "additionalProperties": false
      }
    }
  },
  "additionalProperties": false
}
```

<Accordion title="Example 1: Invoice Extraction">
  A common schema for extracting data from US invoices.

  ```json theme={null}
  {
    "type": "object",
    "properties": {
      "document_type": {
        "type": "string",
        "description": "Type of document (invoice, receipt, etc.)"
      },
      "invoice_header": {
        "type": "object",
        "description": "Invoice header details",
        "properties": {
          "invoice_number": {
            "type": "string",
            "description": "Invoice number"
          },
          "invoice_date": {
            "type": "string",
            "description": "Invoice issue date"
          },
          "due_date": {
            "type": "string",
            "description": "Payment due date"
          }
        },
        "required": ["invoice_number", "invoice_date"],
        "additionalProperties": false
      },
      "vendor": {
        "type": "object",
        "description": "Vendor information",
        "properties": {
          "vendor_name": {
            "type": "string",
            "description": "Legal business name"
          },
          "vendor_address": {
            "type": "string",
            "description": "Business address"
          },
          "vendor_email": {
            "type": "string",
            "description": "Accounts receivable contact email"
          }
        },
        "required": ["vendor_name"],
        "additionalProperties": false
      },
      "line_items": {
        "type": "array",
        "description": "List of billed items",
        "items": {
          "type": "object",
          "properties": {
            "description": {
              "type": "string",
              "description": "Item or service description"
            },
            "quantity": {
              "type": "number",
              "description": "Quantity billed"
            },
            "unit_price": {
              "type": "number",
              "description": "Price per unit in USD"
            },
            "line_total": {
              "type": "number",
              "description": "Total cost for this line item"
            }
          },
          "required": ["description", "quantity", "unit_price"],
          "additionalProperties": false
        }
      },
      "invoice_totals": {
        "type": "object",
        "description": "Invoice totals",
        "properties": {
          "subtotal": {
            "type": "number",
            "description": "Subtotal before tax"
          },
          "sales_tax": {
            "type": "number",
            "description": "Sales tax amount"
          },
          "total_amount_due": {
            "type": "number",
            "description": "Final amount due in USD"
          }
        },
        "required": ["total_amount_due"],
        "additionalProperties": false
      }
    },
    "required": ["document_type", "invoice_header", "invoice_totals"],
    "additionalProperties": false
  }
  ```
</Accordion>

<Accordion title="Example 2: Public Company Filing (10-K / Annual Report)">
  A schema for extracting governance and ownership information from US SEC filings.

  ```json theme={null}
  {
    "type": "object",
    "properties": {
      "board_of_directors": {
        "type": "array",
        "description": "Board of Directors",
        "items": {
          "type": "object",
          "properties": {
            "director_name": {
              "type": "string",
              "description": "Full name of board member"
            }
          },
          "required": ["director_name"],
          "additionalProperties": false
        }
      },
      "major_shareholders": {
        "type": "array",
        "description": "Major shareholders and ownership",
        "items": {
          "type": "object",
          "properties": {
            "shareholder_name": {
              "type": "string",
              "description": "Name of shareholder"
            },
            "ownership_percentage": {
              "type": "string",
              "description": "Percentage ownership"
            }
          },
          "required": ["shareholder_name", "ownership_percentage"],
          "additionalProperties": false
        }
      }
    },
    "required": ["board_of_directors", "major_shareholders"],
    "additionalProperties": false
  }
  ```
</Accordion>
