DocExtract API Documentation

Convert PDF files into structured data using AI-powered parsing with Django integration.

API v1 Django Compatible

Base URL

POST https://docextract.ai/api/v1/docextract/pdf/

All API requests should be made to this endpoint using the POST method.

Authentication

Include your API key in every request for authentication. Get Your api keys at docextract.ai/manage-api

api_key = "ABCDE*****"
Field Location Required Description
api_key Form Data Required Unique key issued per organization

Request Parameters

Content-Type: multipart/form-data
Field Type Required Default Description
file File Required - The PDF file to process
api_key String Required - Organization's API key
formats List Optional - List of column headers
output_format String Optional json Output format: json or html_table
keep_records Boolean Optional False Whether to store extracted data in DocExtract also return list of image urls
output_image_format String Optional - if you want base64-Image then use 'base64'
combined_records Boolean Optional - if you want combined result

Example: cURL Request

curl -X POST https://docextract.ai/api/v1/docextract/pdf/ \
  -F "file=@invoice.pdf" \
  -F "api_key=ABCDE*****" \
  -F "format=dataframe" \
  -F "keep_records=true"\
  -F "output_image_format=base64" 

Response Schema

Success Response (200 OK)

Returned upon successful document processing.

{
    "status": "Success",
    "filename": "sample_invoice.pdf",
    "pages": 3,
    "remaining_docs": 297,
    "size": 156,
    "data": [
        // Array of page objects...
    ],
    "combined_data": [
        // Array of combined data...
    ],
     "s3_image_urls": [
        //list of image urls
    ],
    "s3_csv_url": csv url,
}

Common Error Responses

Invalid API Key (401 Unauthorized)

Returned when the API key fails server-side validation.

{
    "error": "[APIKeyValidationError 401]: Unexpected server error during API key validation."
}

Missing Form Fields (400 Bad Request)

Returned if the file or api_key fields are missing from the request. This can also occur if the file field is present but has no data.

{
    "file": ["This field is required."],
    "api_key": ["This field is required."]
}

Unsupported File Format (400 Bad Request)

Returned when a file with an unsupported extension is sent.

{
    "error": "Unsupported file format: .txt"
}

Invalid File for Endpoint (400 Bad Request)

Returned when an incorrect file type is sent to an endpoint (e.g., an image file to the /pdf/ endpoint).

{
    "error": "File processing failed"
}

Usage Limit Exceeded

Returned when the monthly page limit for the API key has been exceeded.

{
    "error": "Monthly page limit exceeded."
}

File Identification Error (500 Internal Server Error)

A server-side error returned when the backend fails to identify a file's content type (e.g., sending a non-image file to the /image/ endpoint).

{
    "status": "error",
    "message": "Processing failed: cannot identify image file <_io.BytesIO object at 0x...>"
}

Processing Workflow

  1. API Key Validation - Verify authentication credentials
  2. File Reading & Format Check - Validate PDF file integrity
  3. Page Count & Quota Check - Verify available credits
  4. Concurrent Page Processing - Asynchronous AI parsing
  5. Data Aggregation - Compile results in JSON/DataFrame format
  6. S3 Upload - Store results if enabled

Quota Management

Metric Description
credits Pages remaining in current billing cycle
monthly_page_processed Total pages processed this month
storage_used Bytes stored in S3 storage

Security & Encryption

  • Unique API keys per organization
  • Sanitized file handling and validation
  • End-to-end encryption for extracted data
  • Built-in quota and abuse protection

Sample Output

{
  "image": "<base64-Image>",
  "analysis": {
    "Invoice Number": "INV-2034",
    "Total": "1,250.00",
    "Date": "2024-01-15",
    "Vendor": "Acme Corp"
  }
}