Get Job Status

Overview

The Get Job Status endpoint allows you to check the current status of any asynchronous processing job. This is essential for monitoring long-running operations like document extraction, parsing, and batch processing.

Jobs are stored in Supabase and updated in real-time. Status checks are lightweight and can be polled frequently.

Request

job_id

string

required

The unique identifier of the job to check

X-API-Key

string

API key for authentication (optional for some endpoints)

Response

string

Unique identifier for the job

status

string

Current job status: “queued”, “PROCESSING”, “COMPLETED”, or “FAILED”

type

string

The operation type for this job (e.g., “extraction”, “chunking”, “classification”)

created_at

string

Timestamp when the job was created

updated_at

string

Timestamp when the job was last updated

pdf_name

string

Original filename of the processed document

pdf_hash

string

Hash of the PDF file for identification

error

string

Error message (if job failed)

curl -X GET "https://visionapi.unsiloed.ai/jobs/b2094b38-e432-44b6-a5d0-67bed07d5de1" \
  -H "X-API-Key: your-api-key"

{
  "id": "b2094b38-e432-44b6-a5d0-67bed07d5de1",
  "status": "queued",
  "type": "extraction",
  "created_at": "2024-01-15T10:30:00.000Z",
  "updated_at": "2024-01-15T10:30:00.000Z",
  "pdf_name": "financial_report.pdf",
  "pdf_hash": "sha256:abc123...",
  "user_id": "user_123"
}

Job Status Values

queued

PROCESSING

COMPLETED

FAILED

Polling for Completion

For long-running jobs, implement polling to check status periodically:

import time
import requests

def wait_for_job_completion(job_id, api_key, poll_interval=5, max_wait=300):
    """Wait for job to complete with polling"""
    
    start_time = time.time()
    headers = {"X-API-Key": api_key}
    
    while time.time() - start_time < max_wait:
        response = requests.get(f"https://visionapi.unsiloed.ai/jobs/{job_id}", headers=headers)
        
        if response.status_code == 200:
            job = response.json()
            status = job['status']
            
            print(f"Job status: {status}")
            
            if status == 'COMPLETED':
                return job, True
            elif status == 'FAILED':
                return job, False
                
        time.sleep(poll_interval)
    
    raise TimeoutError(f"Job {job_id} did not complete within {max_wait} seconds")

# Usage
job_id = "your-job-id"
job, success = wait_for_job_completion(job_id, "your-api-key")

if success:
    print("Job completed successfully!")
else:
    print(f"Job failed: {job.get('error', 'Unknown error')}")

Best Practices

Polling Frequency: Poll every 5-10 seconds for most jobs. Avoid polling more frequently than every 2 seconds.

Timeout Handling: Set reasonable timeouts (5-15 minutes) depending on document size and complexity.

Job Persistence: Completed jobs and their results are stored for 7 days before cleanup.

Common Issues

Job Not Found

Long Processing Times

Failed Jobs

Batch Status Check

Check the status of all jobs in a batch.

response = requests.get(
    "https://api.example.com/jobs/batch/batch_xyz789abc123",
    headers=headers,
    params={"include_progress": True}
)

batch_status = response.json()
print(f"Batch Status: {batch_status['status']}")
print(f"Completed Jobs: {batch_status['completed_jobs']}/{batch_status['total_jobs']}")

for job in batch_status['jobs']:
    print(f"  {job['job_id']}: {job['status']}")

{
  "batch_id": "batch_xyz789abc123",
  "status": "processing",
  "created_at": "2024-01-15T10:30:00Z",
  "total_jobs": 5,
  "completed_jobs": 3,
  "failed_jobs": 0,
  "cancelled_jobs": 0,
  "overall_progress": 60,
  "jobs": [
    {
      "job_id": "job_1",
      "status": "completed",
      "operation": "extract",
      "progress": 100
    },
    {
      "job_id": "job_2",
      "status": "completed", 
      "operation": "parse",
      "progress": 100
    },
    {
      "job_id": "job_3",
      "status": "processing",
      "operation": "classify",
      "progress": 75
    }
  ]
}

Multiple Jobs Status

Check status of multiple jobs simultaneously.

Query Parameters

job_ids

string

required

Comma-separated list of job IDs

include_progress

boolean

default:"false"

Include progress details for all jobs

job_ids = ["job_abc123", "job_def456", "job_ghi789"]
params = {
    "job_ids": ",".join(job_ids),
    "include_progress": True
}

response = requests.get(
    "https://api.example.com/jobs/status",
    headers=headers,
    params=params
)

jobs_status = response.json()
for job in jobs_status['jobs']:
    print(f"{job['job_id']}: {job['status']} ({job.get('progress', {}).get('percentage', 0)}%)")

Job Status Polling

Polling Best Practices

import time

def wait_for_job_completion(job_id, max_wait_time=300, poll_interval=5):
    """
    Poll job status until completion or timeout
    """
    start_time = time.time()
    
    while time.time() - start_time < max_wait_time:
        response = requests.get(
            f"https://api.example.com/jobs/{job_id}",
            headers=headers
        )
        
        if response.status_code == 200:
            job = response.json()
            status = job['status']
            
            print(f"Job {job_id}: {status}")
            
            if status in ['completed', 'failed', 'cancelled']:
                return job
                
            # Show progress if available
            if 'progress' in job:
                print(f"  Progress: {job['progress']['percentage']}%")
        
        time.sleep(poll_interval)
    
    raise TimeoutError(f"Job {job_id} did not complete within {max_wait_time} seconds")

# Usage
try:
    completed_job = wait_for_job_completion("job_abc123def456")
    if completed_job['status'] == 'completed':
        print("Job completed successfully!")
    else:
        print(f"Job failed: {completed_job.get('error_details', {}).get('message')}")
except TimeoutError as e:
    print(f"Timeout: {e}")

Exponential Backoff Polling

def poll_with_backoff(job_id, initial_interval=2, max_interval=30, backoff_factor=1.5):
    """
    Poll with exponential backoff to reduce API calls
    """
    interval = initial_interval
    
    while True:
        response = requests.get(f"https://api.example.com/jobs/{job_id}", headers=headers)
        job = response.json()
        
        if job['status'] in ['completed', 'failed', 'cancelled']:
            return job
            
        print(f"Job {job_id}: {job['status']} - waiting {interval}s")
        time.sleep(interval)
        
        # Increase interval for next poll
        interval = min(interval * backoff_factor, max_interval)

Real-time Status Updates

WebSocket Connection

const ws = new WebSocket('wss://api.example.com/jobs/job_abc123def456/status');

ws.onopen = function() {
    console.log('Connected to job status stream');
};

ws.onmessage = function(event) {
    const statusUpdate = JSON.parse(event.data);
    
    console.log(`Job ${statusUpdate.job_id}: ${statusUpdate.status}`);
    
    if (statusUpdate.progress) {
        console.log(`Progress: ${statusUpdate.progress.percentage}%`);
        updateProgressBar(statusUpdate.progress.percentage);
    }
    
    if (statusUpdate.status === 'completed') {
        console.log('Job completed!');
        ws.close();
        handleJobCompletion(statusUpdate);
    } else if (statusUpdate.status === 'failed') {
        console.log('Job failed:', statusUpdate.error_details);
        ws.close();
        handleJobFailure(statusUpdate);
    }
};

ws.onerror = function(error) {
    console.error('WebSocket error:', error);
};

Job Status Filters

List jobs with status filtering.

Query Parameters

status

string

Filter by status: “queued”, “processing”, “completed”, “failed”, “cancelled”

operation

string

Filter by operation type: “extract”, “parse”, “split”, “classify”

created_after

string

ISO timestamp - only jobs created after this time

created_before

string

ISO timestamp - only jobs created before this time

Status Change Notifications

Webhook Events

Jobs automatically send webhook notifications for status changes:

{
  "event": "job.status_changed",
  "job_id": "job_abc123def456",
  "previous_status": "processing",
  "current_status": "completed",
  "timestamp": "2024-01-15T10:35:42Z",
  "progress": {
    "percentage": 100,
    "files_completed": 3,
    "total_files": 3
  }
}

Email Notifications

Configure email notifications for job completion:

# Enable email notifications when creating job
data = {
    'operation': 'extract',
    'parameters': extraction_params,
    'notification_settings': {
        'email': 'user@example.com',
        'events': ['completed', 'failed'],
        'include_summary': True
    }
}

Error Handling

Common Error Scenarios

404

Not Found

Job ID does not exist or has been deleted

403

Forbidden

Job belongs to different account or insufficient permissions

429

Too Many Requests

Status check rate limit exceeded

Error Response Format

{
  "error": "Job not found",
  "details": {
    "error_type": "not_found",
    "message": "Job with ID 'job_invalid123' does not exist",
    "job_id": "job_invalid123"
  },
  "suggestions": [
    "Check the job ID for typos",
    "Verify the job belongs to your account",
    "Job may have been automatically deleted after retention period"
  ]
}

Performance Optimization

Efficient Status Checking

# Instead of checking each job individually
for job_id in job_ids:
    response = requests.get(f"https://api.example.com/jobs/{job_id}")  # Multiple API calls

# Use batch status check
response = requests.get(
    "https://api.example.com/jobs/status",
    params={"job_ids": ",".join(job_ids)}  # Single API call
)

Caching Status Results

import time
from functools import lru_cache

@lru_cache(maxsize=100)
def get_job_status_cached(job_id, cache_duration=30):
    """Cache job status for 30 seconds to reduce API calls"""
    cache_key = f"{job_id}_{int(time.time() // cache_duration)}"
    
    response = requests.get(f"https://api.example.com/jobs/{job_id}", headers=headers)
    return response.json()

# Usage - will use cached result if called within 30 seconds
status = get_job_status_cached("job_abc123def456")

Rate Limits

Individual Status Check: 1000 requests per minute
Batch Status Check: 100 requests per minute
Job Listing: 60 requests per minute
WebSocket Connections: 10 concurrent connections per account

Rate limits are enforced per API key and reset on a rolling window basis.

Core Endpoints

Job Management

Overview

Request

Response

Job Status Values

Polling for Completion

Best Practices

Common Issues

Batch Status Check

Multiple Jobs Status

Query Parameters

Job Status Polling

Polling Best Practices

Exponential Backoff Polling

Real-time Status Updates

WebSocket Connection

Job Status Filters

Query Parameters

Status Change Notifications

Webhook Events

Email Notifications

Error Handling

Common Error Scenarios

Error Response Format

Performance Optimization

Efficient Status Checking

Caching Status Results

Rate Limits

Core Endpoints

Job Management

​Overview

​Request

​Response

​Job Status Values

​Polling for Completion

​Best Practices

​Common Issues

​Batch Status Check

​Multiple Jobs Status

​Query Parameters

​Job Status Polling

​Polling Best Practices

​Exponential Backoff Polling

​Real-time Status Updates

​WebSocket Connection

​Job Status Filters

​Query Parameters

​Status Change Notifications

​Webhook Events

​Email Notifications

​Error Handling

​Common Error Scenarios

​Error Response Format

​Performance Optimization

​Efficient Status Checking

​Caching Status Results

​Rate Limits

Overview

Request

Response

Job Status Values

Polling for Completion

Best Practices

Common Issues

Batch Status Check

Multiple Jobs Status

Query Parameters

Job Status Polling

Polling Best Practices

Exponential Backoff Polling

Real-time Status Updates

WebSocket Connection

Job Status Filters

Query Parameters

Status Change Notifications

Webhook Events

Email Notifications

Error Handling

Common Error Scenarios

Error Response Format

Performance Optimization

Efficient Status Checking

Caching Status Results

Rate Limits