Extract Tables - Unsiloed AI

Overview

The Extract Tables endpoint processes PDF documents and extracts structured table data using advanced AI-powered table detection and parsing. This endpoint is specifically designed to identify, extract, and structure tabular content from documents with high accuracy.

The endpoint returns a job ID for asynchronous processing. Use the job management endpoints to check status and retrieve results.

Request

pdf_file

file

required

The PDF file to process for table extraction. Maximum file size: 100MB

Response

job_id

string

Unique identifier for the table extraction job

status

string

Initial job status (typically “queued”)

message

string

Descriptive message about the job creation

quota_remaining

number

Number of credits remaining in your quota

curl -X POST "https://prod.visionapi.unsiloed.ai/tables" \
  -H "accept: application/json" \
  -H "api-key: your-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "pdf_file=@document.pdf;type=application/pdf"

{
  "job_id": "e4a73f3c-869e-4129-bdef-614a6a51b043",
  "status": "queued",
  "message": "PDF table extraction started",
  "quota_remaining": 48960
}

Job Management Integration

After creating a table extraction job, use the job management endpoints to monitor progress and retrieve results:

import time
import requests

def wait_for_table_extraction_completion(job_id, api_key):
    """Poll job status until completion"""
    
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/jobs/{job_id}"
    
    while True:
        response = requests.get(status_url, headers=headers)
        
        if response.status_code == 200:
            status_data = response.json()
            print(f"Job status: {status_data['status']}")
            
            if status_data['status'] == 'completed':
                # Get results
                results_url = f"https://prod.visionapi.unsiloed.ai/jobs/{job_id}/results"
                results_response = requests.get(results_url, headers=headers)
                
                if results_response.status_code == 200:
                    return results_response.json()
                else:
                    raise Exception(f"Failed to get results: {results_response.text}")
                    
            elif status_data['status'] == 'failed':
                raise Exception(f"Job failed: {status_data.get('error', 'Unknown error')}")
                
        time.sleep(5)  # Wait 5 seconds before next check

# Usage
job_id = "e4a73f3c-869e-4129-bdef-614a6a51b043"
results = wait_for_table_extraction_completion(job_id, "your-api-key")
print("Table extraction completed:", results)

Expected Results Structure

When the job completes, the results will contain structured table data:

{
  "job_id": "e4a73f3c-869e-4129-bdef-614a6a51b043",
  "status": "completed",
  "results": {
    "tables": [
      {
        "table_id": "table_1",
        "page_number": 1,
        "bbox": {
          "x1": 50,
          "y1": 100,
          "x2": 550,
          "y2": 300
        },
        "confidence": 0.95,
        "headers": [
          "Product",
          "Quantity",
          "Price",
          "Total"
        ],
        "rows": [
          ["Widget A", "10", "$25.00", "$250.00"],
          ["Widget B", "5", "$40.00", "$200.00"],
          ["Widget C", "8", "$15.00", "$120.00"]
        ],
        "structured_data": {
          "headers": ["Product", "Quantity", "Price", "Total"],
          "data": [
            {
              "Product": "Widget A",
              "Quantity": "10",
              "Price": "$25.00",
              "Total": "$250.00"
            },
            {
              "Product": "Widget B",
              "Quantity": "5",
              "Price": "$40.00",
              "Total": "$200.00"
            },
            {
              "Product": "Widget C",
              "Quantity": "8",
              "Price": "$15.00",
              "Total": "$120.00"
            }
          ]
        },
        "table_type": "data_table",
        "column_count": 4,
        "row_count": 3
      }
    ],
    "summary": {
      "total_tables": 1,
      "pages_processed": 1,
      "extraction_method": "ai_vision",
      "processing_time": 4.2
    }
  }
}

Authorizations

api-key

string

header

required

Body

multipart/form-data

pdf_file

file

required

The PDF file to process for table extraction. Maximum file size: 100MB

Response

200 - application/json

Successful response

job_id

string

Unique identifier for the table extraction job

status

string

Initial job status

message

string

Descriptive message about the job creation

quota_remaining

number

Number of credits remaining in your quota

Core Endpoints

Job Management

​Overview

​Request

​Response

​Job Management Integration

​Expected Results Structure

Authorizations

Body

Response

Overview

Request

Response

Job Management Integration

Expected Results Structure