> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unsiloed.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse Document (v3)

> Async PDF parsing that returns markdown. Supports inline upload, public URL, presigned upload for large files, and archive-based batches — all under one endpoint with per-key isolation.

## Overview

The v3 endpoint parses a PDF (or archive of PDFs) and returns **markdown text** for each page. Compared to v1/v2 it is intentionally simpler: no layout / OCR-engine / segment-analysis knobs, no segment tree in the response. You submit a PDF, you get markdown back. The pipeline picks the best model per page internally.

The endpoint is **async**: every submission returns a `job_id` you poll until the job reaches a terminal state.

**Endpoint base URL:** `https://prod.visionapi.unsiloed.ai/v3/parse`

<Tip>
  Use this endpoint when you want clean markdown out of a PDF without configuring layout or OCR settings. For fine-grained control over segment types, bounding boxes, and per-segment processing, use the [v2 Parse Document](/api-reference/parser/parse-document-v2) endpoint instead.
</Tip>

The v3 surface has four routes:

| Route                    | Use it for                                                                   |
| ------------------------ | ---------------------------------------------------------------------------- |
| `POST /v3/parse`         | Submit a single PDF — three body shapes (multipart, JSON URL, JSON file\_id) |
| `POST /v3/parse/upload`  | Mint a presigned PUT URL for PDFs larger than the inline cap                 |
| `POST /v3/parse/batch`   | Submit a tar/tar.gz/zip archive of PDFs in one job                           |
| `GET /v3/parse/{job_id}` | Poll status and retrieve the inline markdown result                          |

## Authentication

Every request requires an `X-API-Key` header. Keys are **personal**, **rate-limited per key** (100 requests/day, 2 RPS), and **isolated** — you can only see your own jobs.

<Note>
  v3 API keys are issued on request — they are separate from v1/v2 keys. To get one, email **[aman@unsiloed.ai](mailto:aman@unsiloed.ai)** (or open an issue at [github.com/Unsiloed-AI/unsiloed-olmocr-benchmark](https://github.com/Unsiloed-AI/unsiloed-olmocr-benchmark/issues/new)) with a one-line note about what you're evaluating. Typical turnaround is same-day.
</Note>

## Guarantees

| Property                  | What it means for you                                                                                                                                                      |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Per-key isolation**     | Polling another user's `job_id` returns `404`, as does trying to re-parse another user's `file_id`.                                                                        |
| **24-hour retention**     | Every job artifact — your uploaded PDF, status, result, container logs — is deleted automatically 24 hours after the job is created. Pull your results within that window. |
| **No scoring on our end** | The API returns markdown only. If you want to reproduce a benchmark number, run the unmodified upstream scorer against the markdown locally.                               |

***

## POST /v3/parse — Submit a single PDF

`POST /v3/parse` accepts **three** body shapes (auto-detected from the `Content-Type` header). All three submit the same async job and return the same response.

### Body shape 1 — Inline multipart upload

For small PDFs (up to \~3 MB raw). Single HTTP call.

<ParamField body="file" type="file" required>
  PDF binary, sent as `multipart/form-data`. Capped at \~3 MB raw (≈ 4 MB after base64 encoding inside API Gateway). For larger files use body shape 2 or 3.
</ParamField>

<ParamField query="pages" type="string">
  Optional query parameter on the request URL. Restrict OCR to a subset of pages.

  * `"1-5"`: pages 1 through 5
  * `"1,3,5"`: specific pages
  * omitted: all pages
</ParamField>

```bash theme={null}
curl -X POST \
  -H "X-API-Key: <your-api-key>" \
  -F file=@input.pdf \
  https://prod.visionapi.unsiloed.ai/v3/parse
```

### Body shape 2 — JSON with caller-hosted URL

For PDFs up to 50 MB that you already host (S3 public-read, S3 presigned, GitHub release asset, your own web server, etc.). Single HTTP call.

<ParamField body="url" type="string" required>
  Publicly fetchable `https://` URL **or** `s3://bucket/key` reference to the PDF. We fetch it. URLs pointing at private IPs, link-local, or AWS instance metadata are rejected by an SSRF guard.
</ParamField>

```bash theme={null}
curl -X POST \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/your-paper.pdf"}' \
  https://prod.visionapi.unsiloed.ai/v3/parse
```

### Body shape 3 — JSON with `file_id` from a presigned upload

For PDFs up to 50 MB that you do **not** want to host publicly. First call `POST /v3/parse/upload` (below) to get a presigned `upload_url` and `file_id`; PUT your PDF to the URL; then submit the parse using the `file_id`.

<ParamField body="file_id" type="string" required>
  The `file_id` returned by `POST /v3/parse/upload` after you finish the PUT. Acts as the `job_id` for subsequent polling.
</ParamField>

```bash theme={null}
curl -X POST \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"<file_id>"}' \
  https://prod.visionapi.unsiloed.ai/v3/parse
```

<Note>
  You must use **the same API key** that minted the `file_id` via `POST /v3/parse/upload`. Cross-key submissions return `404` (hiding existence) — this is how per-key isolation is enforced.
</Note>

### Response (any body shape)

<ResponseField name="job_id" type="string" required>
  Job identifier (32-character UUID hex). Pass to `GET /v3/parse/{job_id}` to poll. When you used body shape 3, this equals the `file_id` you submitted.
</ResponseField>

<ResponseField name="status" type="string" required>
  Always `"queued"` on submission. Subsequent values: `"running"` → `"done"` or `"failed"`.
</ResponseField>

<ResponseField name="created_at" type="string" required>
  ISO 8601 timestamp when the job was created.
</ResponseField>

```json theme={null}
{
  "job_id":     "5eb3493042f84af5860531df5b18c56b",
  "status":     "queued",
  "created_at": "2026-05-11T18:47:26Z"
}
```

***

## POST /v3/parse/upload — Presigned upload URL

Returns a presigned S3 `PUT` URL so you can upload a PDF directly (bypassing the API Gateway request size cap). Use this for the 3-call flow of body shape 3 above. **No request body required** — just an empty POST with the auth header.

```bash theme={null}
curl -X POST \
  -H "X-API-Key: <your-api-key>" \
  https://prod.visionapi.unsiloed.ai/v3/parse/upload
```

### Response

<ResponseField name="file_id" type="string" required>
  Opaque identifier. After you PUT the PDF to `upload_url`, pass this back as `{"file_id": "..."}` to `POST /v3/parse` to start parsing.
</ResponseField>

<ResponseField name="upload_url" type="string" required>
  Presigned S3 `PUT` URL. 1-hour expiry from issuance. **Send the PDF body directly to this URL with HTTP method `PUT` and `Content-Type: application/pdf`.** The transfer bypasses our API Gateway entirely.
</ResponseField>

<ResponseField name="upload_method" type="string" required>
  Always `"PUT"`.
</ResponseField>

<ResponseField name="upload_content_type" type="string" required>
  Always `"application/pdf"`. Your `PUT` must set the same `Content-Type` header.
</ResponseField>

<ResponseField name="max_bytes" type="integer" required>
  Maximum PDF size accepted by the pipeline after upload. Currently `52428800` (50 MB).
</ResponseField>

<ResponseField name="expires_in" type="integer" required>
  Seconds until the `upload_url` expires (3600).
</ResponseField>

```json theme={null}
{
  "file_id":             "527f4097f3d1...",
  "upload_url":          "https://...s3.amazonaws.com/...?X-Amz-Signature=...",
  "upload_method":       "PUT",
  "upload_content_type": "application/pdf",
  "max_bytes":           52428800,
  "expires_in":          3600,
  "next":                "POST /v3/parse with body {\"file_id\": \"527f4097f3d1...\"}"
}
```

### Full 3-call flow

```bash theme={null}
export API=https://prod.visionapi.unsiloed.ai/v3/parse
export KEY=<your-api-key>

# 1. Mint a presigned URL
UP=$(curl -s -X POST -H "X-API-Key: $KEY" $API/upload)
FILE_ID=$(jq -r .file_id <<< "$UP")
UPLOAD_URL=$(jq -r .upload_url <<< "$UP")

# 2. PUT the PDF directly to S3 (no auth header needed — URL is presigned)
curl -X PUT \
  -H "Content-Type: application/pdf" \
  --upload-file big.pdf \
  "$UPLOAD_URL"

# 3. Start the parse
curl -X POST \
  -H "X-API-Key: $KEY" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$FILE_ID\"}" \
  $API
```

***

## POST /v3/parse/batch — Archive of PDFs

Process many PDFs in one job. You host an archive of PDFs; we fetch it and process every PDF inside.

<ParamField body="url" type="string" required>
  Public `https://` URL or `s3://` reference to a `.tar`, `.tar.gz`/`.tgz`, or `.zip` archive of PDFs. Archive format is **auto-detected by content sniffing** the first bytes, not by file extension. Non-PDF files inside the archive are skipped silently.
</ParamField>

```bash theme={null}
curl -X POST \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/my-bench.tar.gz"}' \
  https://prod.visionapi.unsiloed.ai/v3/parse/batch
```

Submission response is the same shape as `POST /v3/parse`:

```json theme={null}
{
  "job_id":     "...",
  "status":     "queued",
  "created_at": "..."
}
```

The completion response uses `documents[]` instead of `pages[]` (one entry per PDF in the archive) — see "Polling" below.

***

## GET /v3/parse/{job_id} — Poll status + retrieve result

```bash theme={null}
curl -H "X-API-Key: <your-api-key>" \
  https://prod.visionapi.unsiloed.ai/v3/parse/<job_id>
```

### Query parameters

<ParamField query="format" type="string">
  When set to `"markdown"` **and** `status` is `"done"`, returns concatenated page markdown as `Content-Type: text/markdown; charset=utf-8` instead of a JSON envelope. Useful for `curl ... | tee out.md`. Ignored while the job is queued/running/failed.
</ParamField>

### Response — while running

```json theme={null}
{
  "job_id":     "5eb3493042f84af5860531df5b18c56b",
  "status":     "running",
  "created_at": "...",
  "started_at": "...",
  "progress":   { "page": 2, "of": 5 },
  "phase":      "ocr"
}
```

### Response — single-PDF done

<ResponseField name="status" type="string" required>
  `"done"` for a completed single-PDF job.
</ResponseField>

<ResponseField name="page_count" type="integer" required>
  Number of pages in the PDF (after applying the `pages` selector, if any).
</ResponseField>

<ResponseField name="pages" type="array" required>
  Per-page markdown. Each entry has `page` (1-indexed integer) and `markdown` (string). Page order is ascending.
</ResponseField>

```json theme={null}
{
  "job_id":      "5eb3493042f84af5860531df5b18c56b",
  "status":      "done",
  "file_name":   "input.pdf",
  "created_at":  "...",
  "started_at":  "...",
  "finished_at": "...",
  "page_count":  3,
  "pages": [
    { "page": 1, "markdown": "..." },
    { "page": 2, "markdown": "..." },
    { "page": 3, "markdown": "..." }
  ]
}
```

### Response — batch done

<ResponseField name="documents" type="array" required>
  One entry per PDF found in the archive. Each entry has `pdf` (relative path inside the archive), `page_count`, and `pages[]` (same shape as single-PDF). If a particular PDF failed, the entry has an `error` field instead of `pages`.
</ResponseField>

```json theme={null}
{
  "job_id":     "...",
  "status":     "done",
  "source_url": "https://example.com/my-bench.tar.gz",
  "pdf_count":  3,
  "documents": [
    { "pdf": "doc1.pdf", "page_count": 1,
      "pages": [{ "page": 1, "markdown": "..." }] },
    { "pdf": "doc2.pdf", "page_count": 2,
      "pages": [{ "page": 1, "markdown": "..." }, { "page": 2, "markdown": "..." }] },
    { "pdf": "broken.pdf",
      "error": "PdfStreamError: Stream has ended unexpectedly" }
  ]
}
```

<Note>
  If the JSON would exceed API Gateway's 10 MB response cap, the response is `{ "job_id", "status": "done", "result_url" }` instead — fetch `result_url` (presigned S3 GET) to download the same JSON. The schema of the downloaded JSON is identical to the inline shape, so clients can use one code path for both.
</Note>

### Response — failed

```json theme={null}
{
  "job_id":     "...",
  "status":     "failed",
  "created_at": "...",
  "error":      "PDF exceeds size limit (50 MB)"
}
```

### Polling example

```python theme={null}
import requests, time

API = "https://prod.visionapi.unsiloed.ai/v3/parse"
KEY = "<your-api-key>"

def wait_for(job_id):
    while True:
        r = requests.get(f"{API}/{job_id}", headers={"X-API-Key": KEY})
        r.raise_for_status()
        body = r.json()
        status = body["status"]
        if status in ("done", "failed"):
            return body
        time.sleep(5)
```

***

## Error responses

| Status                  | Body / Header                                  | When                                                                                                       | What to do                                                                                |
| ----------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| `401`                   | `{"message": "Unauthorized"}`                  | Missing or invalid `X-API-Key`                                                                             | Use a personal key; request one if you don't have it yet                                  |
| `403`                   | `{"message": "Forbidden"}`                     | Key not yet propagated through API Gateway edges (within 30–60s of issuance)                               | Retry after a minute                                                                      |
| `404`                   | `{"error": "Job not found"}`                   | Job doesn't exist, **or** the job belongs to a different API key                                           | Confirm you're using the same key that submitted the job; otherwise check the `job_id`    |
| `404`                   | `{"error": "no upload found for file_id=..."}` | The `file_id` you sent doesn't have an uploaded PDF behind it, **or** it belongs to a different key        | Make sure you completed the `PUT` step from `/upload`, and that you're using the same key |
| `413`                   | `{"message": "Request Too Long"}`              | Multipart body too big for API Gateway's request cap                                                       | Switch to body shape 2 (JSON URL) or body shape 3 (presigned upload)                      |
| `429`                   | `{"message": "Limit Exceeded"}`                | Per-key quota (100 requests/day) or rate limit (2 RPS / 2 burst) exceeded                                  | Slow down and retry; quota resets daily                                                   |
| `400`                   | `{"error": "invalid url: ..."}`                | URL validation failed (wrong scheme, IP-literal, link-local, RFC1918, AWS metadata, etc.)                  | Use an `https://` or `s3://` URL pointing at a public/presigned object                    |
| `400`                   | `{"error": "multipart parse failed: ..."}`     | Malformed multipart body                                                                                   | Verify your client sets `Content-Type: multipart/form-data; boundary=...` correctly       |
| job `failed` (200 body) | `{"status": "failed", "error": "..."}`         | Container hit a runtime error (file isn't a real PDF, archive contains no PDFs, fetch URL timed out, etc.) | Read the `error` field; fix and resubmit                                                  |

## Code examples

<RequestExample>
  ```bash cURL — Body shape 1 (multipart) theme={null}
  curl -X POST \
    -H "X-API-Key: your-api-key" \
    -F file=@input.pdf \
    "https://prod.visionapi.unsiloed.ai/v3/parse"
  ```

  ```bash cURL — Body shape 2 (caller-hosted URL) theme={null}
  curl -X POST \
    -H "X-API-Key: your-api-key" \
    -H "Content-Type: application/json" \
    -d '{"url":"https://example.com/your-paper.pdf"}' \
    "https://prod.visionapi.unsiloed.ai/v3/parse"
  ```

  ```bash cURL — Body shape 3 (3-call upload flow) theme={null}
  KEY=your-api-key
  API=https://prod.visionapi.unsiloed.ai/v3/parse

  UP=$(curl -s -X POST -H "X-API-Key: $KEY" "$API/upload")
  FILE_ID=$(jq -r .file_id <<< "$UP")
  UPLOAD_URL=$(jq -r .upload_url <<< "$UP")

  curl -X PUT \
    -H "Content-Type: application/pdf" \
    --upload-file big.pdf \
    "$UPLOAD_URL"

  curl -X POST \
    -H "X-API-Key: $KEY" \
    -H "Content-Type: application/json" \
    -d "{\"file_id\":\"$FILE_ID\"}" \
    "$API"
  ```

  ```bash cURL — Batch theme={null}
  curl -X POST \
    -H "X-API-Key: your-api-key" \
    -H "Content-Type: application/json" \
    -d '{"url":"https://example.com/my-bench.tar.gz"}' \
    "https://prod.visionapi.unsiloed.ai/v3/parse/batch"
  ```

  ```python Python theme={null}
  import os, time, requests

  API = "https://prod.visionapi.unsiloed.ai/v3/parse"
  KEY = os.environ["UNSILOED_API_KEY"]
  HEADERS = {"X-API-Key": KEY}

  # Small PDF (≤ 3 MB) — body shape 1
  def submit_small(pdf_path: str) -> str:
      with open(pdf_path, "rb") as f:
          r = requests.post(
              API,
              headers=HEADERS,
              files={"file": (os.path.basename(pdf_path), f, "application/pdf")},
          )
      r.raise_for_status()
      return r.json()["job_id"]

  # Caller-hosted URL — body shape 2
  def submit_url(url: str) -> str:
      r = requests.post(
          API,
          headers={**HEADERS, "Content-Type": "application/json"},
          json={"url": url},
      )
      r.raise_for_status()
      return r.json()["job_id"]

  # Large PDF via presigned upload — body shape 3
  def submit_large(pdf_path: str) -> str:
      up = requests.post(f"{API}/upload", headers=HEADERS).json()
      file_id, upload_url = up["file_id"], up["upload_url"]
      with open(pdf_path, "rb") as f:
          requests.put(
              upload_url,
              headers={"Content-Type": "application/pdf"},
              data=f,
          ).raise_for_status()
      r = requests.post(
          API,
          headers={**HEADERS, "Content-Type": "application/json"},
          json={"file_id": file_id},
      )
      r.raise_for_status()
      return r.json()["job_id"]

  # Poll until done or failed
  def wait(job_id: str) -> dict:
      while True:
          body = requests.get(f"{API}/{job_id}", headers=HEADERS).json()
          if body["status"] in ("done", "failed"):
              return body
          time.sleep(5)

  job = submit_small("input.pdf")
  result = wait(job)
  if result["status"] == "done":
      for p in result["pages"]:
          print(f"--- page {p['page']} ---")
          print(p["markdown"])
  else:
      print("FAILED:", result.get("error"))
  ```

  ```javascript JavaScript theme={null}
  const API = "https://prod.visionapi.unsiloed.ai/v3/parse";
  const KEY = process.env.UNSILOED_API_KEY;

  // Body shape 1 — small PDF via multipart
  async function submitSmall(file) {
    const form = new FormData();
    form.append("file", file);
    const r = await fetch(API, { method: "POST",
      headers: { "X-API-Key": KEY }, body: form });
    if (!r.ok) throw new Error(`submit failed: ${r.status}`);
    return (await r.json()).job_id;
  }

  // Body shape 2 — caller-hosted URL
  async function submitUrl(url) {
    const r = await fetch(API, { method: "POST",
      headers: { "X-API-Key": KEY, "Content-Type": "application/json" },
      body: JSON.stringify({ url }) });
    if (!r.ok) throw new Error(`submit failed: ${r.status}`);
    return (await r.json()).job_id;
  }

  // Body shape 3 — presigned upload for large files
  async function submitLarge(file) {
    const up = await (await fetch(`${API}/upload`, { method: "POST",
      headers: { "X-API-Key": KEY } })).json();
    await fetch(up.upload_url, { method: "PUT",
      headers: { "Content-Type": "application/pdf" }, body: file });
    const r = await fetch(API, { method: "POST",
      headers: { "X-API-Key": KEY, "Content-Type": "application/json" },
      body: JSON.stringify({ file_id: up.file_id }) });
    if (!r.ok) throw new Error(`submit failed: ${r.status}`);
    return (await r.json()).job_id;
  }

  async function wait(jobId) {
    while (true) {
      const r = await fetch(`${API}/${jobId}`, { headers: { "X-API-Key": KEY } });
      const body = await r.json();
      if (body.status === "done" || body.status === "failed") return body;
      await new Promise(res => setTimeout(res, 5000));
    }
  }
  ```
</RequestExample>

<ResponseExample>
  ```json Submission (201) theme={null}
  {
    "job_id":     "5eb3493042f84af5860531df5b18c56b",
    "status":     "queued",
    "created_at": "2026-05-11T18:47:26Z"
  }
  ```

  ```json Presigned upload (POST /v3/parse/upload, 201) theme={null}
  {
    "file_id":             "527f4097f3d1...",
    "upload_url":          "https://...s3.amazonaws.com/...?X-Amz-Signature=...",
    "upload_method":       "PUT",
    "upload_content_type": "application/pdf",
    "max_bytes":           52428800,
    "expires_in":          3600
  }
  ```

  ```json Single-PDF done (GET /v3/parse/{job_id}) theme={null}
  {
    "job_id":      "5eb3493042f84af5860531df5b18c56b",
    "status":      "done",
    "file_name":   "input.pdf",
    "started_at":  "2026-05-11T18:47:36Z",
    "finished_at": "2026-05-11T18:47:53Z",
    "page_count":  1,
    "pages": [
      { "page": 1, "markdown": "Example 4.28. Let us use Proposition 4.26 ..." }
    ]
  }
  ```

  ```json Batch done (GET /v3/parse/{job_id}) theme={null}
  {
    "job_id":     "...",
    "status":     "done",
    "source_url": "https://example.com/my-bench.tar.gz",
    "pdf_count":  3,
    "documents": [
      { "pdf": "doc1.pdf", "page_count": 1,
        "pages": [{ "page": 1, "markdown": "..." }] },
      { "pdf": "doc2.pdf", "page_count": 2,
        "pages": [{ "page": 1, "markdown": "..." }, { "page": 2, "markdown": "..." }] }
    ]
  }
  ```

  ```json Failed (GET /v3/parse/{job_id}) theme={null}
  {
    "job_id":     "...",
    "status":     "failed",
    "created_at": "...",
    "error":      "PDF exceeds size limit (50 MB)"
  }
  ```
</ResponseExample>

## See also

* **[Open-source benchmark harness + client](https://github.com/Unsiloed-AI/unsiloed-olmocr-benchmark)** — reproduces our published olmOCR-Bench numbers across vendors and includes a thin client (`clients/bench_via_api.py`) that calls this endpoint, collects the returned markdown, and scores it with the unmodified upstream scorer.
* **[v1 Parse Document](/api-reference/parser/parse-document)** — segmented response with bounding boxes, OCR data, and per-segment processing knobs.
* **[v2 Parse Document (Presigned Upload)](/api-reference/parser/parse-document-v2)** — segmented response variant with presigned upload for larger files and higher throughput.


## OpenAPI

````yaml api-reference/parser/openapi-v3.json POST /v3/parse
openapi: 3.1.0
info:
  title: Unsiloed Parser API — v3
  description: >-
    Async PDF parsing that returns markdown. Supports inline multipart upload,
    caller-hosted URL, presigned upload for large files, and archive-based
    batches — all under one endpoint with per-key isolation.
  contact:
    name: Unsiloed
    url: https://unsiloed.ai
    email: hello@unsiloed.com
  license:
    name: ''
  version: 3.0.0
servers:
  - url: https://prod.visionapi.unsiloed.ai
    description: Production
security: []
tags:
  - name: Parse v3
    description: Async markdown-only parsing with per-key isolation
paths:
  /v3/parse:
    post:
      tags:
        - Parse v3
      summary: POST /v3/parse
      description: >-
        Submit a single PDF for async parsing. Accepts three body shapes
        (auto-detected from Content-Type): multipart inline upload, JSON with a
        caller-hosted URL, or JSON with a `file_id` from a presigned upload.
        Returns a `job_id` you poll via GET /v3/parse/{job_id}.
      operationId: submit_parse_v3
      parameters:
        - name: pages
          in: query
          required: false
          description: >-
            Optional page selector. Examples: `1-5`, `1,3,5`. Omitted means all
            pages.
          schema:
            type: string
      requestBody:
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/MultipartParseRequest'
          application/json:
            schema:
              oneOf:
                - $ref: '#/components/schemas/UrlParseRequest'
                - $ref: '#/components/schemas/FileIdParseRequest'
        required: true
      responses:
        '201':
          description: Job accepted and queued
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobSubmission'
        '400':
          description: Invalid request body, malformed multipart, or rejected URL
        '401':
          description: Unauthorized — missing or invalid X-API-Key
        '403':
          description: Forbidden — key not yet propagated
        '413':
          description: Request body too large — use body shape 2 or 3
        '429':
          description: Per-key quota or rate limit exceeded
      security:
        - api_key: []
components:
  schemas:
    MultipartParseRequest:
      type: object
      description: Body shape 1 — multipart inline upload for small PDFs (≤ ~3 MB raw).
      required:
        - file
      properties:
        file:
          type: string
          format: binary
          description: PDF binary. Capped at ~3 MB raw.
    UrlParseRequest:
      type: object
      description: Body shape 2 — caller-hosted URL (up to 50 MB).
      required:
        - url
      properties:
        url:
          type: string
          description: >-
            Publicly fetchable https:// URL or s3://bucket/key reference. URLs
            pointing at private IPs, link-local, or AWS instance metadata are
            rejected by an SSRF guard.
    FileIdParseRequest:
      type: object
      description: Body shape 3 — submit a previously presigned-uploaded PDF by file_id.
      required:
        - file_id
      properties:
        file_id:
          type: string
          description: >-
            The file_id returned by POST /v3/parse/upload after the PUT
            completes. Must be used with the same API key that minted it.
    JobSubmission:
      type: object
      description: Async job acknowledgment.
      required:
        - job_id
        - status
        - created_at
      properties:
        job_id:
          type: string
          description: 32-character UUID hex. Pass to GET /v3/parse/{job_id} to poll.
        status:
          type: string
          description: Always `queued` on submission.
          enum:
            - queued
        created_at:
          type: string
          description: ISO 8601 timestamp when the job was created.
  securitySchemes:
    api_key:
      type: apiKey
      in: header
      name: X-API-Key
      description: Personal API key. Issued separately from v1/v2 keys.

````