DocExtract API Documentation
Convert PDF files into structured data using AI-powered parsing with Django integration.
Base URL
POST https://docextract.ai/api/v2/docextract/pdf/
All API requests should be made to this endpoint using the POST method.
Authentication
Include your API key in every request for authentication. Get Your api keys at docextract.ai/manage-api
api_key = "ABCDE*****"
| Field | Location | Required | Description |
|---|---|---|---|
api_key |
Form Data | Required | Unique key issued per organization |
Request Parameters
Content-Type: multipart/form-data
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Required | - | The PDF file to process |
api_key |
String | Required | - | Organization's API key |
formats |
List | Optional | - | List of column headers |
output_format |
String | Optional | json | Output format: json or html_table |
keep_records |
Boolean | Optional | False | Whether to store extracted data in DocExtract also return list of image urls |
output_image_format |
String | Optional | - | if you want base64-Image then use 'base64' |
combined_records |
Boolean | Optional | - | if you want combined result |
Example: cURL Request
curl -X POST https://docextract.ai/api/v1/docextract/pdf/ \ -F "file=@invoice.pdf" \ -F "api_key=ABCDE*****" \ -F "format=dataframe" \ -F "keep_records=true"\ -F "output_image_format=base64"
Response Schema
Success Response (200 OK)
Returned upon successful document processing.
{
"status": "Success",
"filename": "sample_invoice.pdf",
"data": {
"image": "<base64-Image>"
"analysis": dataframe or json
},
"pages": 3,
"remaining_docs": 297,
"size": 156,
"pdf_url": s3 pdf url,
"s3_csv_url": s3 csv url,
"s3_image_urls": [
//list of s3 image urls
],
"combined_data": [
// Array of combined data...
],
}
Common Error Responses
Invalid API Key (401 Unauthorized)
Returned when the API key fails server-side validation.
{
"error": "[APIKeyValidationError 401]: Unexpected server error during API key validation."
}
Missing Form Fields (400 Bad Request)
Returned if the file or api_key fields are missing from the request. This can also occur if the file field is present but has no data.
{
"file": ["This field is required."],
"api_key": ["This field is required."]
}
Unsupported File Format (400 Bad Request)
Returned when a file with an unsupported extension is sent.
{
"error": "Unsupported file format: .txt"
}
Invalid File for Endpoint (400 Bad Request)
Returned when an incorrect file type is sent to an endpoint (e.g., an image file to the /pdf/ endpoint).
{
"error": "File processing failed"
}
Usage Limit Exceeded
Returned when the monthly page limit for the API key has been exceeded.
{
"error": "[APIKeyValidationError 401]: Monthly page limit exceeded."
}
File Identification Error (500 Internal Server Error)
A server-side error returned when the backend fails to identify a file's content type (e.g., sending a non-image file to the /image/ endpoint).
{
"status": "error",
"message": "Processing failed: cannot identify image file <_io.BytesIO object at 0x...>"
}
Processing Workflow
- API Key Validation - Verify authentication credentials
- File Reading & Format Check - Validate PDF file integrity
- Page Count & Quota Check - Verify available credits
- Concurrent Page Processing - Asynchronous AI parsing
- Data Aggregation - Compile results in JSON/DataFrame format
- S3 Upload - Store results if enabled
Quota Management
| Metric | Description |
|---|---|
credits |
Pages remaining in current billing cycle |
monthly_page_processed |
Total pages processed this month |
storage_used |
Bytes stored in S3 storage |
Security & Encryption
- Unique API keys per organization
- Sanitized file handling and validation
- End-to-end encryption for extracted data
- Built-in quota and abuse protection
Sample Output
{
"image": "<base64-Image>",
"analysis": {
"Invoice Number": "INV-2034",
"Total": "1,250.00",
"Date": "2024-01-15",
"Vendor": "Acme Corp"
}
}