Skip to main content
GET
/
parse
/
{job_id}
curl -X 'GET' \
  'https://prod.visionapi.unsiloed.ai/parse/04a7a6d8-5ef7-465a-b22a-8a98e7104dd9' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key'
{
  "job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
  "status": "Starting",
  "created_at": "2025-10-22T06:51:16.870302Z",
  "metadata": {}
}

Overview

The Get Parse Job Status endpoint allows you to check the current status of parsing jobs and retrieve the complete results when processing is complete. This endpoint is specifically designed for the parsing API and returns comprehensive document analysis including text extraction, image recognition, table parsing, and OCR data.
Parsing jobs are processed asynchronously. Use this endpoint to poll for completion and retrieve results when the job status is “Succeeded”.

Parameters

job_id
string
required
Job ID returned by POST /parse.
base64_urls
boolean
Return segment images as base64-encoded data URIs instead of S3 presigned URLs. Defaults to false.
include_chunks
boolean
Include the chunks array in the response. Defaults to true.
output_file
boolean
Return a presigned S3 URL to the raw output JSON file instead of inlining the full response body. Defaults to false.

Response

job_id
string
Job identifier.
status
string
Current job status: Starting, Processing, Succeeded, Failed, or Cancelled.
created_at
string
ISO 8601 timestamp when the job was created.
metadata
object
Citation or job metadata. Populated when xml_citation is enabled or from the job record.
started_at
string
ISO 8601 timestamp when processing started. Present when status is not Starting.
finished_at
string
ISO 8601 timestamp when processing completed. Present when status is Succeeded or Failed.
total_chunks
integer
Total number of document chunks. Present when status is Succeeded.
page_count
integer
Number of pages in the document. Present when status is Succeeded.
chunks
array
Array of document chunks with segments and extracted content. Present when status is Succeeded.
pdf_url
string
Presigned S3 URL to the generated PDF. Present when status is Succeeded.
file_name
string
Original file name from the job record.
file_type
string
MIME type of the uploaded file.
file_url
string
S3 URL of the original uploaded file.
credit_used
integer
Credits used for this job.
message
string
Error or status detail message. Present when status is Failed.
configuration
object
Configuration used for this job, mirroring the parameters submitted at creation time.
merge_tables
boolean
Whether table merging was enabled for this job.
id
string
Internal job record ID.
curl -X 'GET' \
  'https://prod.visionapi.unsiloed.ai/parse/04a7a6d8-5ef7-465a-b22a-8a98e7104dd9' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key'
{
  "job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
  "status": "Starting",
  "created_at": "2025-10-22T06:51:16.870302Z",
  "metadata": {}
}

Job Status Values

Job has been created and is waiting to be processed. This is the initial status when a parsing job is first created.
Job is currently being processed. This includes PDF parsing, text extraction, image analysis, table detection, and OCR processing.
Job has completed successfully. The response includes the complete analysis results with all extracted data, images, and metadata.
Job failed during processing. Check the message field for details about what went wrong.
Job was cancelled before processing completed.

Polling Strategy

For long-running parsing jobs, implement a polling strategy to check status periodically:
import requests
import time

def poll_parse_job(job_id, api_key, max_wait_time=300, poll_interval=5):
    """Poll a parsing job until completion or timeout"""
    
    start_time = time.time()
    headers = {"api-key": api_key}
    
    while time.time() - start_time < max_wait_time:
        response = requests.get(
            f"https://prod.visionapi.unsiloed.ai/parse/{job_id}",
            headers=headers
        )
        
        if response.status_code == 200:
            job = response.json()
            
            if job['status'] == 'Succeeded':
                return job
            elif job['status'] == 'Failed':
                raise Exception(f"Job failed: {job.get('message', 'Unknown error')}")
            elif job['status'] in ['Starting', 'Processing']:
                print(f"Job status: {job['status']} - waiting...")
                time.sleep(poll_interval)
            else:
                print(f"Unknown status: {job['status']}")
                time.sleep(poll_interval)
        else:
            print(f"Error checking status: {response.status_code}")
            time.sleep(poll_interval)
    
    raise Exception("Job polling timed out")

# Usage
try:
    result = poll_parse_job("04a7a6d8-5ef7-465a-b22a-8a98e7104dd9", "your-api-key")
    print("Job completed successfully!")
    print(f"Total chunks: {result['total_chunks']}")
except Exception as e:
    print(f"Error: {e}")

Segment Types

When a job succeeds, the response includes detailed analysis of different document segments:

Title

Top-level document titles, distinct from section headers.

SectionHeader

Document headers and titles that define section boundaries.

Text

Regular text content including paragraphs, sentences, and individual text elements.

ListItem

Individual items within ordered or unordered lists.

Table

Tabular data with structured rows and columns.

Picture

Images and graphics within the document, including logos, charts, and illustrations.

Caption

Text captions associated with images or figures.

Formula

Mathematical or chemical formulas detected within the document.

Footnote

Footnote text appearing at the bottom of a page. Recurring header content appearing at the top of pages. Recurring footer content appearing at the bottom of pages.

Page

A full-page segment when the document is processed without fine-grained layout analysis. Each segment includes:
  • segment_type: Type of content detected
  • content: Extracted text content
  • image: URL to extracted image (if applicable)
  • page_number: Page where the segment appears
  • confidence: Confidence score for the extraction
  • bbox: Precise coordinates of the segment
  • html: HTML-formatted content
  • markdown: Markdown-formatted content
  • ocr: Detailed OCR data with individual text elements

Error Handling

Common Error Scenarios

  1. Job Not Found: Invalid or expired job ID returns a 404 response.
  2. Invalid API Key: Authentication failed.
  3. Client-Side Polling Timeout: The job did not complete within the time your polling logic allows. This is not a server-returned error; implement a reasonable client-side timeout and handle it gracefully.
  4. Server Error: Internal processing error returns a 500 response.

Best Practices

  • Polling Frequency: Check status every 5-10 seconds for long-running jobs
  • Timeout Handling: Implement reasonable timeouts to prevent infinite polling
  • Error Recovery: Handle failed jobs gracefully with retry logic
  • API Key Security: Keep your API key secure and never expose it in client-side code

Rate Limits

  • Status Checks: Rate limits apply to prevent abuse
  • Concurrent Jobs: Limited number of active parsing jobs per API key
  • Request Frequency: Avoid excessive polling (recommended: 5-10 second intervals)
Check your API plan for specific limits and quotas.

Authorizations

Authorization
string
header
required

API key for authentication. Use 'Bearer <your_api_key>'

Path Parameters

job_id
string
required

Job ID returned by POST /parse.

Query Parameters

base64_urls
boolean

Return segment images as base64-encoded data URIs instead of S3 presigned URLs. Defaults to false.

include_chunks
boolean

Include the chunks array in the response. Defaults to true.

output_file
boolean

Return a presigned S3 URL to the raw output JSON file instead of inlining the full response body. Defaults to false.

Response

Job status and results. Output fields (chunks, total_chunks, page_count, pdf_url) are present only when status is Succeeded.

Response body for GET /parse/{job_id}.

Fields marked as optional appear only when the job has reached the relevant status.

created_at
string
required

ISO 8601 timestamp when the job was created.

job_id
string
required

Job identifier.

metadata
object
required

Citation or job metadata. Populated when xml_citation is enabled or from the job record.

status
string
required

Current job status: Starting, Processing, Succeeded, Failed, or Cancelled.

chunks
object[] | null

Array of document chunks with segments and extracted content. Present when status is Succeeded.

configuration
object

Configuration used for this job (mirrors the parameters submitted at creation time).

credit_used
integer<int64> | null

Credits used for this job.

file_name
string | null

Original file name from the job record.

file_type
string | null

MIME type of the uploaded file.

file_url
string | null

S3 URL of the original uploaded file.

finished_at
string | null

ISO 8601 timestamp when processing completed. Present when status is Succeeded or Failed.

id
string | null

Internal Supabase job record ID.

merge_tables
boolean | null

Whether table merging was enabled for this job.

message
string | null

Error or status detail message. Present when status is Failed.

page_count
integer<int64> | null

Number of pages in the document. Present when status is Succeeded.

pdf_url
string | null

Presigned S3 URL to the generated PDF. Present when status is Succeeded.

started_at
string | null

ISO 8601 timestamp when processing started. Present when status is not Starting.

total_chunks
integer<int64> | null

Total number of document chunks. Present when status is Succeeded.