Skip to main content
POST
/
v2
/
parse
/
upload
# Step 1: Create job and get presigned URL
RESPONSE=$(curl -s -X POST https://prod.visionapi.unsiloed.ai/v2/parse/upload \
  -H "api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "report.pdf",
    "use_high_resolution": true,
    "segmentation_method": "smart_layout_detection",
    "ocr_mode": "auto_ocr",
    "merge_tables": false
  }')

JOB_ID=$(echo "$RESPONSE" | jq -r '.job_id')
UPLOAD_URL=$(echo "$RESPONSE" | jq -r '.upload_url')

# Step 2: Upload file
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @report.pdf

# Step 3: Poll for results
curl -X GET "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID" \
  -H "api-key: your-api-key"
{
  "job_id": "a3f1c2d4-7e8b-4a9f-b2c1-123456789abc",
  "upload_url": "https://upload.visionapi.unsiloed.ai/uploads/a3f1c2d4.../report.pdf?signature=...",
  "expires_at": "2025-10-22T07:06:16Z",
  "upload_method": "PUT",
  "upload_headers": {
    "Content-Type": "application/pdf"
  }
}

Overview

The v2 presigned upload endpoint decouples document delivery from job creation. Instead of uploading your file through the API server, you:
  1. POST to /v2/parse/upload with your configuration — the API creates a parse job and returns a short-lived presigned URL.
  2. PUT your file directly to the presigned URL — the API server is never in the transfer path.
  3. Once the upload completes, the job is automatically enqueued for processing.
  4. Poll GET /parse/{job_id} to track progress and retrieve results — the same endpoint as v1.
This is the recommended endpoint for large files and high-throughput workloads. It supports faster uploads, larger file sizes, and higher concurrency than the standard Parse Document endpoint.
The job status starts as AwaitingUpload. It transitions to Queued once the upload completes, then to Processing, and finally Succeeded or Failed. See Get Parse Job Status for the full response schema and polling examples.

Request

file_name
string
required
File name with extension (e.g. "report.pdf"). Determines the MIME type for the presigned URL. Supported formats: PDF, images (PNG, JPEG, TIFF), and office documents (DOCX, PPTX, XLSX).
use_high_resolution
boolean
Use high-resolution images for cropping and post-processing. Defaults to true.
segmentation_method
string
Layout analysis strategy. smart_layout_detection (default) detects fine-grained elements using bounding boxes. page_by_page treats each page as a single segment and is faster for simple documents.
ocr_mode
string
OCR processing mode. auto_ocr (default) applies OCR only where needed. force_ocr forces OCR on every text element — recommended for scanned documents.
ocr_engine
string
OCR engine to use. UnsiloedBeta (default) — handles rotated/warped text and irregular bounding boxes. UnsiloedHawk — higher accuracy for complex layouts and mixed content. UnsiloedStorm — enterprise-grade accuracy optimized for 50+ languages.
merge_tables
boolean
Merge tables that span multiple pages into a single unified structure. Defaults to false.
extract_charts
boolean
Extract structured data from charts and graphs. Defaults to false.
detect_checkboxes
boolean
Detect and extract checkbox states from forms. Defaults to false.
Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false.
extract_colors
boolean
Transfer text color from the PDF text layer to OCR results. Defaults to false.
xml_citation
boolean
Extract and hyperlink bibliography citations in the markdown output. Defaults to false.
page_range
string
Restrict processing to a subset of pages. Formats: "1-5", "2,4,6", "[1,3,5]". Defaults to all pages.
expires_in
integer
Seconds until the task and its output are deleted.
segment_type_naming
string
Segment type naming convention. Unsiloed (default) or Other.
validate_segments
array
JSON array of segment types to validate with VLM. Example: ["Table", "Formula"].
validate_table_segments
boolean
Legacy: validate table segments using VLM. Prefer validate_segments instead.
chunk_processing
object
JSON object controlling how segments are grouped into chunks. See the v1 parse endpoint for full schema details.
segment_processing
object
JSON object controlling per-segment-type processing (HTML/Markdown generation strategy, content source, model). See the v1 parse endpoint for full schema details.
llm_processing
object
JSON object for LLM processing configuration.
output_fields
object
JSON object controlling which output fields are included in the response.
error_handling
string
How to handle per-page errors. Continue (default) skips failed pages and continues. Fail aborts the entire job on the first error.

Response

job_id
string
Unique identifier for the parse job. Use this with GET /parse/{job_id} to poll status and retrieve results.
upload_url
string
Presigned PUT URL. Upload your file directly to this URL — no api-key header needed for the upload itself. Valid until expires_at.
expires_at
string
RFC 3339 timestamp when upload_url expires. Upload must complete before this time.
upload_method
string
Always "PUT". Use an HTTP PUT request when uploading to upload_url.
upload_headers
object
Key-value headers you MUST include in your PUT request. Always includes Content-Type set to the MIME type inferred from file_name.

Step-by-Step Guide

Step 1 — Create the parse job

POST to /v2/parse/upload with your file name and configuration. The API returns a job_id and a short-lived presigned URL.
import requests

API_KEY = "your-api-key"
BASE_URL = "https://prod.visionapi.unsiloed.ai"

response = requests.post(
    f"{BASE_URL}/v2/parse/upload",
    headers={"api-key": API_KEY, "Content-Type": "application/json"},
    json={
        "file_name": "report.pdf",
        "use_high_resolution": True,
        "segmentation_method": "smart_layout_detection",
        "ocr_mode": "auto_ocr",
        "merge_tables": False,
    },
)
response.raise_for_status()
data = response.json()

job_id = data["job_id"]
upload_url = data["upload_url"]
upload_headers = data["upload_headers"]

print(f"Job ID: {job_id}")
print(f"Upload URL expires: {data['expires_at']}")
The response contains the presigned URL and required headers:
{
  "job_id": "a3f1c2d4-7e8b-4a9f-b2c1-123456789abc",
  "upload_url": "https://upload.visionapi.unsiloed.ai/uploads/a3f1c2d4.../report.pdf?signature=...",
  "expires_at": "2025-10-22T07:06:16Z",
  "upload_method": "PUT",
  "upload_headers": {
    "Content-Type": "application/pdf"
  }
}

Step 2 — Upload the file

Use the upload_url and upload_headers from the response to PUT your file. No API key is needed for this request.
with open("report.pdf", "rb") as f:
    put_response = requests.put(upload_url, headers=upload_headers, data=f)
put_response.raise_for_status()

print("Upload complete — job is now queued for processing")
You must include every header listed in upload_headers. A missing or mismatched Content-Type will cause the upload to be rejected with a 403.

Step 3 — Poll for results

Once the upload completes, the job transitions from AwaitingUploadQueuedProcessingSucceeded. Poll GET /parse/{job_id} using the same endpoint as v1.
import time

while True:
    status_response = requests.get(
        f"{BASE_URL}/parse/{job_id}",
        headers={"api-key": API_KEY},
    )
    status_response.raise_for_status()
    job = status_response.json()
    print(f"Status: {job['status']}")

    if job["status"] == "Succeeded":
        print(f"Done! {job['total_chunks']} chunks extracted.")
        break
    elif job["status"] == "Failed":
        raise RuntimeError(f"Job failed: {job.get('message')}")

    time.sleep(5)
See Get Parse Job Status for the full response schema.
# Step 1: Create job and get presigned URL
RESPONSE=$(curl -s -X POST https://prod.visionapi.unsiloed.ai/v2/parse/upload \
  -H "api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "report.pdf",
    "use_high_resolution": true,
    "segmentation_method": "smart_layout_detection",
    "ocr_mode": "auto_ocr",
    "merge_tables": false
  }')

JOB_ID=$(echo "$RESPONSE" | jq -r '.job_id')
UPLOAD_URL=$(echo "$RESPONSE" | jq -r '.upload_url')

# Step 2: Upload file
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @report.pdf

# Step 3: Poll for results
curl -X GET "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID" \
  -H "api-key: your-api-key"
{
  "job_id": "a3f1c2d4-7e8b-4a9f-b2c1-123456789abc",
  "upload_url": "https://upload.visionapi.unsiloed.ai/uploads/a3f1c2d4.../report.pdf?signature=...",
  "expires_at": "2025-10-22T07:06:16Z",
  "upload_method": "PUT",
  "upload_headers": {
    "Content-Type": "application/pdf"
  }
}

Why Use v2

Featurev1 (POST /parse)v2 (POST /v2/parse/upload)
File deliveryThrough API serverDirect via presigned URL
Max file sizeLimited by server uploadUp to 5 GB via direct PUT
Upload speedBottlenecked by API serverFull client bandwidth
ConcurrencyShared server capacityNo server contention
Best forQuick uploads, small filesLarge files, high volume

Error Handling

StatusCauseAction
400Invalid or missing file_name, unsupported extensionFix file_name and retry
401Missing or invalid api-key (handled by authentication middleware)Check your API key
429Rate limit exceededBack off and retry after 1 second
503Job queue is at capacityBack off and retry after the Retry-After header value
403 on uploadMissing or wrong headers, expired URLCheck upload_headers, get a new URL
Job status FailedProcessing errorCheck message field in the status response

Authorizations

Authorization
string
header
required

API key for authentication. Use 'Bearer <your_api_key>'

Body

application/json

Request body for POST /v2/parse/upload. Configuration fields mirror the existing /parse multipart form fields.

file_name
string
required

File name with extension. Required. Determines content-type.

chunk_processing
any

JSON object for chunk processing configuration.

detect_checkboxes
boolean | null

Detect checkboxes using Swin Transformer Detection API. Defaults to false.

error_handling
string | null

Error handling strategy: Continue (default) or Fail.

expires_in
integer<int32> | null

Seconds until the task and its output are deleted.

extract_charts
boolean | null

Extract structured data from charts and graphs. Defaults to false.

extract_colors
boolean | null

Transfer text color from the PDF text layer to OCR results. Defaults to false.

Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false.

llm_processing
any

JSON object for LLM processing configuration.

merge_tables
boolean | null

Merge tables that span multiple pages into a single unified structure. Defaults to false.

ocr_engine
string | null

OCR engine: UnsiloedBeta (default), UnsiloedHawk, or UnsiloedStorm.

ocr_mode
string | null

OCR processing mode. auto_ocr (default) or force_ocr.

output_fields
any

JSON object controlling which output fields are included in the response.

page_range
string | null

Page range to process. Formats: "1-5", "2,4,6", "[1,3,5]". Defaults to all pages.

segment_processing
any

JSON object for segment processing/analysis configuration.

segment_type_naming
string | null

Segment type naming convention: Unsiloed (default) or Other.

segmentation_method
string | null

Segmentation strategy: smart_layout_detection (default) or page_by_page.

use_high_resolution
boolean | null

Use high-resolution images for cropping and post-processing. Defaults to true.

validate_segments
any

JSON array of segment types to validate with VLM. Example: ["Table","Formula"].

validate_table_segments
boolean | null

Legacy: validate table segments using VLM. Prefer validate_segments instead.

xml_citation
boolean | null

Extract and hyperlink bibliography citations in the markdown output. Defaults to false.

Response

Upload URL created

Response from POST /v2/parse/upload.

expires_at
string
required

RFC 3339 timestamp when upload_url expires.

job_id
string
required

Use this ID to poll GET /parse/{job_id} for status.

upload_headers
object
required

Headers the client MUST include in the PUT request.

upload_method
string
required

Always "PUT".

upload_url
string
required

S3 presigned PUT URL. Valid until expires_at.