DocExtract API Documentation

Convert PDF files into structured data using AI-powered parsing with Django integration.

API v2

Base URL

POST

https://docextract.ai/api/v2/docextract/pdf/

All API requests should be made to this endpoint using the POST method.

Authentication

Include your API key in every request for authentication. Get Your api keys at docextract.ai/manage-api

api_key = "ABCDE*****"

Field	Location	Required	Description
`api_key`	Form Data	Required	Unique key issued per organization

Request Parameters

Content-Type: multipart/form-data

Field	Type	Required	Default	Description
`file`	File	Required	-	The PDF file to process
`api_key`	String	Required	-	Organization's API key
`formats`	List	Optional	-	List of column headers, the format must be a valid array (e.g., ["quantity", "amount"])
`output_format`	String	Optional	json	Output format: `json` or `html_table`
`keep_records`	Boolean	Optional	False	Whether to store extracted data in DocExtract also return list of image urls
`combined_records`	Boolean	Optional	-	if you want combined result

Example: cURL Request

curl -X POST https://docextract.ai/api/v1/docextract/pdf/ \
  -F "file=@invoice.pdf" \
  -F "api_key=ABCDE*****" \
  -F "format=dataframe" \
  -F "keep_records=true"\
  -F "output_image_format=base64"

Response Schema

Success Response (200 OK)

Returned upon successful document processing.

{
    "status": "Success",
    "filename": "sample_invoice.pdf",
    "data": {
        "image": "<base64-Image>"
        "analysis": dataframe or json
    },
    "pages": 3,
    "remaining_docs": 297,
    "size": 156,
    "pdf_url": s3 pdf url,
    "s3_csv_url": s3 csv url,
    "s3_image_urls": [
        //list of s3 image urls
    ],
    "combined_data": [
        // Array of combined data...
    ],
    
    
}

Common Error Responses

Invalid API Key (401 Unauthorized)

Returned when the API key fails server-side validation.

{
    "error": "[APIKeyValidationError 401]: Unexpected server error during API key validation."
}

Missing Form Fields (400 Bad Request)

Returned if the file or api_key fields are missing from the request. This can also occur if the file field is present but has no data.

{
    "file": ["This field is required."],
    "api_key": ["This field is required."]
}

Unsupported File Format (400 Bad Request)

Returned when a file with an unsupported extension is sent.

{
    "error": "Unsupported file format: .txt"
}

Invalid File for Endpoint (400 Bad Request)

Returned when an incorrect file type is sent to an endpoint (e.g., an image file to the /pdf/ endpoint).

{
    "error": "File processing failed"
}

Usage Limit Exceeded

Returned when the monthly page limit for the API key has been exceeded.

{
    "error": "[APIKeyValidationError 401]: Monthly page limit exceeded."
}

File Identification Error (500 Internal Server Error)

A server-side error returned when the backend fails to identify a file's content type (e.g., sending a non-image file to the /image/ endpoint).

{
    "status": "error",
    "message": "Processing failed: cannot identify image file <_io.BytesIO object at 0x...>"
}

Processing Workflow

API Key Validation - Verify authentication credentials
File Reading & Format Check - Validate PDF file integrity
Page Count & Quota Check - Verify available credits
Concurrent Page Processing - Asynchronous AI parsing
Data Aggregation - Compile results in JSON/DataFrame format
S3 Upload - Store results if enabled

Quota Management

Metric	Description
`credits`	Pages remaining in current billing cycle
`monthly_page_processed`	Total pages processed this month
`storage_used`	Bytes stored in S3 storage

Security & Encryption

Unique API keys per organization
Sanitized file handling and validation
End-to-end encryption for extracted data
Built-in quota and abuse protection

Sample Output

{
  "image": "<base64-Image>",
  "analysis": {
    "Invoice Number": "INV-2034",
    "Total": "1,250.00",
    "Date": "2024-01-15",
    "Vendor": "Acme Corp"
  }
}