> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Getting Started With Classification

> Submit a document and a list of candidate categories to /classify, then read back the predicted category.

<Note>
  Classification picks the best-fit label for a document from a list of candidate categories we supply. The endpoint returns the matched category and a confidence score, ready to feed into routing logic. For raw Markdown or structured field extraction instead, see the [Parse quickstart](/quickstart) or the [Extraction quickstart](/document-processing/extraction/quickstart).
</Note>

The walkthrough below builds a script that submits a PDF and our candidate categories to `/classify`, waits for the verdict, and saves the matched category and confidence score to disk. The accordion below has the full script if you'd rather copy and run it directly.

<Accordion title="Show the Full Script">
  Set `UNSILOED_API_KEY` in your environment and save the document you want to classify as `document.pdf` in the same directory before running.

  <Tabs>
    <Tab title="Python">
      ```python classify_document.py theme={null}
      import json
      import os
      import time
      import requests

      API_KEY = os.environ["UNSILOED_API_KEY"]
      BASE_URL = "https://prod.visionapi.unsiloed.ai"

      categories = [
          {"name": "Sales Report"},
          {"name": "Invoice"},
          {"name": "Medical Record"},
      ]

      with open("document.pdf", "rb") as f:
          response = requests.post(
              f"{BASE_URL}/classify",
              headers={"api-key": API_KEY},
              files={"pdf_file": ("document.pdf", f, "application/pdf")},
              data={"categories": json.dumps(categories)},
          )
      response.raise_for_status()

      job_id = response.json()["job_id"]
      print(f"Job submitted: {job_id}")

      max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
      attempts = 0
      while True:
          result = requests.get(
              f"{BASE_URL}/classify/{job_id}",
              headers={"api-key": API_KEY},
          ).json()
          print(f"Status: {result['status']}")
          if result["status"] == "completed":
              break
          if result["status"] == "failed":
              raise RuntimeError(result.get("error", "classify job failed"))
          attempts += 1
          if attempts >= max_attempts:
              raise TimeoutError("Classify job did not finish within 5 minutes")
          time.sleep(5)

      with open("classification.json", "w") as f:
          json.dump(result, f, indent=2)

      classification = result["result"]
      print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")
      ```
    </Tab>

    <Tab title="JavaScript">
      Save this as `script.mjs` or set `"type": "module"` in your `package.json`. Requires Node.js 18 or newer for the global `fetch`, `FormData`, and `Blob`.

      ```javascript script.mjs theme={null}
      import fs from "node:fs";

      const API_KEY = process.env.UNSILOED_API_KEY;
      const BASE_URL = "https://prod.visionapi.unsiloed.ai";

      const categories = [
        { name: "Sales Report", description: "Sales performance summaries with regional or quarterly data" },
        { name: "Invoice", description: "Bill of sale with line items" },
        { name: "Medical Record", description: "Patient health records" },
      ];

      const form = new FormData();
      form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
      form.append("categories", JSON.stringify(categories));

      const response = await fetch(`${BASE_URL}/classify`, {
        method: "POST",
        headers: { "api-key": API_KEY },
        body: form,
      });
      if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);

      const { job_id } = await response.json();
      console.log(`Job submitted: ${job_id}`);

      const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
      let attempts = 0;
      let result;
      while (true) {
        const res = await fetch(`${BASE_URL}/classify/${job_id}`, {
          headers: { "api-key": API_KEY },
        });
        result = await res.json();
        console.log(`Status: ${result.status}`);
        if (result.status === "completed") break;
        if (result.status === "failed") throw new Error(result.error || "classify job failed");
        if (++attempts >= maxAttempts) throw new Error("Classify job did not finish within 5 minutes");
        await new Promise((r) => setTimeout(r, 5000));
      }

      fs.writeFileSync("classification.json", JSON.stringify(result, null, 2));
      const { classification, confidence } = result.result;
      console.log(`Classification: ${classification} (${(confidence * 100).toFixed(2)}% confidence)`);
      ```
    </Tab>

    <Tab title="cURL">
      ```bash theme={null}
      # Submit the document and capture the job_id from the response:
      resp=$(curl -sX POST "https://prod.visionapi.unsiloed.ai/classify" \
        -H "api-key: $UNSILOED_API_KEY" \
        -F "pdf_file=@document.pdf" \
        -F 'categories=[{"name":"Sales Report"},{"name":"Invoice"},{"name":"Medical Record"}]')
      JOB_ID=$(echo "$resp" | grep -o '"job_id":"[^"]*"' | cut -d'"' -f4)
      echo "Job submitted: $JOB_ID"

      # Poll until the job finishes, with a 5-minute timeout:
      attempts=0
      max_attempts=60
      while true; do
        resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/classify/$JOB_ID" \
          -H "api-key: $UNSILOED_API_KEY")
        status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
        echo "Status: $status"
        [ "$status" = "completed" ] && break
        [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
        attempts=$((attempts + 1))
        [ "$attempts" -ge "$max_attempts" ] && { echo "Classify job did not finish within 5 minutes"; exit 1; }
        sleep 5
      done

      # Save the full response to disk:
      echo "$resp" > classification.json
      ```
    </Tab>
  </Tabs>
</Accordion>

## Step 1: Set Up Your Environment

Before writing any code, gather three things: an API key, a document, and the runtime for the chosen language.

### 1.1 Get an Unsiloed AI API Key

To get API access, [sign up on Unsiloed AI](https://cal.com/aman-mishra-p0ry57/15min). Export your key as an environment variable named `UNSILOED_API_KEY` so it stays out of source control:

```bash theme={null}
export UNSILOED_API_KEY="your-api-key"
```

### 1.2 Pick a Document to Classify

The `/classify` endpoint supports PDF, DOCX, PPTX, JPG, PNG, and other formats. The walkthrough below assumes a PDF saved as `document.pdf` in your working directory. To use a different format, update the filename and content type in the snippets to match your file.

If you don't have a document handy, download our [sample PDF](https://raw.githubusercontent.com/Unsiloed-AI/cookbook/c585446e46e4be2790c6c29fe2a7a3a1b346191d/sample-documents/sample-classify.pdf) (a one-page lab report from Riverside Diagnostic Laboratory) and save it as `document.pdf`. The walkthrough scores it against three candidate categories so we can see a clear winner.

### 1.3 Install Dependencies

<Tabs>
  <Tab title="Python">
    You need Python 3.8 or newer. Install the `requests` package:

    ```bash theme={null}
    pip install requests
    ```
  </Tab>

  <Tab title="JavaScript">
    You need Node.js 18 or newer for the global `fetch`, `FormData`, and `Blob`. No external packages needed.
  </Tab>

  <Tab title="cURL">
    You need cURL, which is preinstalled on macOS and most Linux distributions. No external packages needed.
  </Tab>
</Tabs>

## Step 2: Submit a Document With Categories

The request bundles two fields: `pdf_file` for the document and `categories` for a JSON-stringified array of category objects, each with a `name` and an optional `description`. The categories list is the model's entire vocabulary for this call, so clear and distinct names matter more than they might appear. The endpoint returns a `job_id` to poll. All requests go to `https://prod.visionapi.unsiloed.ai` with the API key in the `api-key` header.

### 2.1 Set Up the Script

<Tabs>
  <Tab title="Python">
    Create a file called `classify_document.py` and start with the imports, configuration, and category list:

    ```python classify_document.py theme={null}
    import json
    import os
    import time
    import requests

    API_KEY = os.environ["UNSILOED_API_KEY"]
    BASE_URL = "https://prod.visionapi.unsiloed.ai"

    categories = [
        {"name": "Sales Report"},
        {"name": "Invoice"},
        {"name": "Medical Record"},
    ]
    ```

    `API_KEY` reads your key from the environment so it doesn't get hard-coded into the file, and `BASE_URL` points at the Unsiloed AI production endpoint. The `categories` list defines the candidate labels the model picks from. Only the names guide the result; a `description` key is accepted but not used by classification.
  </Tab>

  <Tab title="JavaScript">
    Create a file called `script.mjs` and start with the imports, configuration, and category list:

    ```javascript script.mjs theme={null}
    import fs from "node:fs";

    const API_KEY = process.env.UNSILOED_API_KEY;
    const BASE_URL = "https://prod.visionapi.unsiloed.ai";

    const categories = [
      { name: "Sales Report", description: "Sales performance summaries with regional or quarterly data" },
      { name: "Invoice", description: "Bill of sale with line items" },
      { name: "Medical Record", description: "Patient health records" },
    ];
    ```

    `API_KEY` reads your key from the environment so it doesn't get hard-coded into the file, and `BASE_URL` points at the Unsiloed AI production endpoint. The `categories` list defines the candidate labels the model picks from. Only the names guide the result; a `description` key is accepted but not used by classification.
  </Tab>

  <Tab title="cURL">
    cURL doesn't need a setup step. Each command below inlines the API key, base URL, and category list directly.
  </Tab>
</Tabs>

### 2.2 Upload the Document

Send the file and the JSON-encoded category list as a multipart upload to `/classify`. The document goes under `pdf_file` and the categories under `categories`.

<Tabs>
  <Tab title="Python">
    Continue the file by uploading the document:

    ```python classify_document.py theme={null}
    with open("document.pdf", "rb") as f:
        response = requests.post(
            f"{BASE_URL}/classify",
            headers={"api-key": API_KEY},
            files={"pdf_file": ("document.pdf", f, "application/pdf")},
            data={"categories": json.dumps(categories)},
        )
    response.raise_for_status()
    ```

    `raise_for_status()` throws an `HTTPError` on any non-2xx response, so there's no need to check `.status_code` separately.
  </Tab>

  <Tab title="JavaScript">
    Continue the file by uploading the document:

    ```javascript script.mjs theme={null}
    const form = new FormData();
    form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
    form.append("categories", JSON.stringify(categories));

    const response = await fetch(`${BASE_URL}/classify`, {
      method: "POST",
      headers: { "api-key": API_KEY },
      body: form,
    });
    if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);
    ```

    `fetch` doesn't throw on non-2xx responses by default, so we check `response.ok` and throw the error explicitly.
  </Tab>

  <Tab title="cURL">
    Run:

    ```bash theme={null}
    curl -X POST "https://prod.visionapi.unsiloed.ai/classify" \
      -H "api-key: $UNSILOED_API_KEY" \
      -F "pdf_file=@document.pdf" \
      -F 'categories=[{"name":"Sales Report"},{"name":"Invoice"},{"name":"Medical Record"}]'
    ```

    The response prints to stdout. We need the `job_id` field for the next step.
  </Tab>
</Tabs>

### 2.3 Capture the Job ID

<Tabs>
  <Tab title="Python">
    Next, read and print the `job_id`:

    ```python classify_document.py theme={null}
    job_id = response.json()["job_id"]
    print(f"Job submitted: {job_id}")
    ```

    Run the script:

    ```bash theme={null}
    python classify_document.py
    ```

    The output should be a single line like `Job submitted: 2c231adf-ad5e-4e2e-8c0c-10cd7025c09b`.
  </Tab>

  <Tab title="JavaScript">
    Next, read and log the `job_id`:

    ```javascript script.mjs theme={null}
    const { job_id } = await response.json();
    console.log(`Job submitted: ${job_id}`);
    ```

    Run the script:

    ```bash theme={null}
    node script.mjs
    ```

    The output should be a single line like `Job submitted: 2c231adf-ad5e-4e2e-8c0c-10cd7025c09b`.
  </Tab>

  <Tab title="cURL">
    The response body from the POST above looks like:

    ```json theme={null}
    {
      "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
      "status": "processing",
      "message": "Classification started",
      "quota_remaining": 7704
    }
    ```

    Copy the `job_id` value to paste into the polling command in the next step.
  </Tab>
</Tabs>

## Step 3: Poll for Results

The job runs asynchronously. GET `/classify/{job_id}` repeatedly until the status is `completed`, then save the classification to disk.

A status of `completed` means the result is ready. A status of `failed` means the job errored. Any other value (such as `processing`) means the job is still running.

### 3.1 Write the Polling Loop

<Tabs>
  <Tab title="Python">
    Drop in a polling loop. The `max_attempts` cap stops the loop if the job hangs:

    ```python classify_document.py theme={null}
    max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
    attempts = 0
    while True:
        result = requests.get(
            f"{BASE_URL}/classify/{job_id}",
            headers={"api-key": API_KEY},
        ).json()
        print(f"Status: {result['status']}")
        if result["status"] == "completed":
            break
        if result["status"] == "failed":
            raise RuntimeError(result.get("error", "classify job failed"))
        attempts += 1
        if attempts >= max_attempts:
            raise TimeoutError("Classify job did not finish within 5 minutes")
        time.sleep(5)
    ```
  </Tab>

  <Tab title="JavaScript">
    Drop in a polling loop. The `maxAttempts` cap stops the loop if the job hangs:

    ```javascript script.mjs theme={null}
    const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
    let attempts = 0;
    let result;
    while (true) {
      const res = await fetch(`${BASE_URL}/classify/${job_id}`, {
        headers: { "api-key": API_KEY },
      });
      result = await res.json();
      console.log(`Status: ${result.status}`);
      if (result.status === "completed") break;
      if (result.status === "failed") throw new Error(result.error || "classify job failed");
      if (++attempts >= maxAttempts) throw new Error("Classify job did not finish within 5 minutes");
      await new Promise((r) => setTimeout(r, 5000));
    }
    ```
  </Tab>

  <Tab title="cURL">
    Replace `JOB_ID` below with the value you captured from Step 2.3, then run this loop. It polls every 5 seconds and gives up after 5 minutes if the job hasn't completed:

    ```bash theme={null}
    JOB_ID="paste-job-id-here"
    attempts=0
    max_attempts=60  # roughly 5 minutes at 5 seconds per poll

    while true; do
      resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/classify/$JOB_ID" \
        -H "api-key: $UNSILOED_API_KEY")
      status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
      echo "Status: $status"
      [ "$status" = "completed" ] && break
      [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
      attempts=$((attempts + 1))
      [ "$attempts" -ge "$max_attempts" ] && { echo "Classify job did not finish within 5 minutes"; exit 1; }
      sleep 5
    done
    ```

    The loop keeps the latest response body in `$resp` for the next step.
  </Tab>
</Tabs>

### 3.2 Save the Classification

Persist the result to disk so downstream code can read it. The full response, including the per-page breakdown, goes to `classification.json`.

<Tabs>
  <Tab title="Python">
    Finally, write the result to disk and print a summary:

    ```python classify_document.py theme={null}
    with open("classification.json", "w") as f:
        json.dump(result, f, indent=2)

    classification = result["result"]
    print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")
    ```

    Run the script:

    ```bash theme={null}
    python classify_document.py
    ```

    You should see one or two `Status: processing` lines, then `Status: completed`, then a summary line like `Classification: Medical Record (100.00% confidence)`. The `classification.json` file appears in the working directory.
  </Tab>

  <Tab title="JavaScript">
    Finally, write the result to disk and log a summary:

    ```javascript script.mjs theme={null}
    fs.writeFileSync("classification.json", JSON.stringify(result, null, 2));
    const { classification, confidence } = result.result;
    console.log(`Classification: ${classification} (${(confidence * 100).toFixed(2)}% confidence)`);
    ```

    Run the script:

    ```bash theme={null}
    node script.mjs
    ```

    You should see one or two `Status: processing` lines, then `Status: completed`, then a summary line like `Classification: Medical Record (100.00% confidence)`. The `classification.json` file appears in the working directory.
  </Tab>

  <Tab title="cURL">
    The polling loop in Step 3.1 left the full response in `$resp`. Write it to disk:

    ```bash theme={null}
    echo "$resp" > classification.json
    ```

    The `classification.json` file now holds the full response. The overall label lives under `result.classification` and the per-page breakdown under `result.page_results`.
  </Tab>
</Tabs>

## Error Responses

Failures fall into two buckets: HTTP errors raised before the job is queued, and a `failed` status on a job that started but could not complete.

### HTTP Errors

The `/classify` endpoint returns JSON error bodies under a `detail` field. The common cases are:

* **`401 Unauthorized`:** `{"detail":"Invalid API key"}`. The `api-key` header is missing or wrong.
* **`400 Bad Request`:** `{"detail":"Either pdf_file or file_url must be provided"}` or `{"detail":"At least one category is required"}`. The submit form is missing a required field.
* **`422 Unprocessable Entity`:** `{"detail":[{"type":"missing","loc":["body","categories"],"msg":"Field required","input":null}]}`. A required form field, usually `categories`, is missing entirely.
* **`404 Not Found`:** `{"detail":"Job not found"}`. The `job_id` you polled doesn't exist.

### Failed Jobs

A job that was accepted but could not be processed returns `status: "failed"` on the polling endpoint. The response shape matches a successful one, but `result` is absent and the `error` field describes what went wrong:

```json theme={null}
{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "failed",
  "progress": "Classification failed",
  "error": "Invalid PDF format"
}
```

## Response Shape

A completed job returns job metadata plus a nested `result` object that contains the overall classification, a confidence score, and per-page results.

```json theme={null}
{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}
```

The fields you use depend on what you're building. They fall into three broad categories:

**For routing decisions:**

* **`result.classification`:** the overall predicted category for the document, drawn from the `name` values you submitted. This is the field the walkthrough prints.
* **`result.confidence`:** confidence score for the overall classification, on a 0-1 scale. Treat it as a soft signal: high values rarely need review, low values flag documents worth a human look.

**For per-page handling and mixed-content documents:**

* **`result.page_results[]`:** the per-page classifications the overall result is built from
* **`page_results[].page`:** 1-indexed page number
* **`page_results[].classification`:** the predicted category for that page
* **`page_results[].raw_result`:** the model's raw output before normalization to a category name; usually identical to `classification`
* **`page_results[].confidence`:** the page-level confidence score on a 0-1 scale

**For job tracking:**

* **`status`:** `completed`, `failed`, or an in-progress value such as `processing`
* **`progress`:** human-readable progress message
* **`error`:** error message if the job failed, otherwise `null`
* **`result.total_pages` and `result.processed_pages`:** how much of the document the classifier got through

### Sample Output

Running the script against the [sample lab report](https://raw.githubusercontent.com/Unsiloed-AI/cookbook/c585446e46e4be2790c6c29fe2a7a3a1b346191d/sample-documents/sample-classify.pdf) and the three categories above writes the verdict to `classification.json`:

```json theme={null}
{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}
```

The Riverside Diagnostic lab report lands cleanly in the `Medical Record` bucket with full confidence. Swap in your own document and category list to see how the classifier handles ambiguous cases.

## Next Steps

<Note>
  For more on classification, including category design tips and the canonical response shape, see the [Classification overview](/document-processing/classification/classification).
</Note>

<CardGroup cols={2}>
  <Card title="Classification Overview" icon="tags" href="/document-processing/classification/classification">
    Understand how the classifier scores pages and when to reach for it.
  </Card>

  <Card title="Response Format" icon="square-list" href="/document-processing/classification/response-format">
    Browse the full classification response with examples for each job state.
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/classification/classify-document">
    Browse the full request and response specs for the classify endpoint.
  </Card>

  <Card title="Splitting" icon="scissors" href="/document-processing/splitting/splitting">
    Split a mixed bundle into separate documents by section.
  </Card>
</CardGroup>
