DocExtract API Documentation
Convert PDF files into structured data using AI-powered parsing with Django integration.
Base URL
POST https://docextract.ai/api/v1/docextract/pdf/
All API requests should be made to this endpoint using the POST method.
Authentication
Include your API key in every request for authentication. Get Your api keys at docextract.ai/manage-api
api_key = "ABCDE*****"
Field | Location | Required | Description |
---|---|---|---|
api_key |
Form Data | Required | Unique key issued per organization |
Request Parameters
Content-Type: multipart/form-data
Field | Type | Required | Default | Description |
---|---|---|---|---|
file |
File | Required | - | The PDF file to process |
api_key |
String | Required | - | Organization's API key |
formats |
List | Optional | - | List of column headers |
output_format |
String | Optional | json | Output format: json or html_table |
keep_records |
Boolean | Optional | False | Whether to store extracted data in DocExtract also return list of image urls |
output_image_format |
String | Optional | - | if you want base64-Image then use 'base64' |
combined_records |
Boolean | Optional | - | if you want combined result |
Example: cURL Request
curl -X POST https://docextract.ai/api/v1/docextract/pdf/ \ -F "file=@invoice.pdf" \ -F "api_key=ABCDE*****" \ -F "format=dataframe" \ -F "keep_records=true"\ -F "output_image_format=base64"
Response Schema
Success Response (200 OK)
Returned upon successful document processing.
{ "status": "Success", "filename": "sample_invoice.pdf", "pages": 3, "remaining_docs": 297, "size": 156, "data": [ // Array of page objects... ], "combined_data": [ // Array of combined data... ], "s3_image_urls": [ //list of image urls ], "s3_csv_url": csv url, }
Common Error Responses
Invalid API Key (401 Unauthorized)
Returned when the API key fails server-side validation.
{ "error": "[APIKeyValidationError 401]: Unexpected server error during API key validation." }
Missing Form Fields (400 Bad Request)
Returned if the file
or api_key
fields are missing from the request. This can also occur if the file
field is present but has no data.
{ "file": ["This field is required."], "api_key": ["This field is required."] }
Unsupported File Format (400 Bad Request)
Returned when a file with an unsupported extension is sent.
{ "error": "Unsupported file format: .txt" }
Invalid File for Endpoint (400 Bad Request)
Returned when an incorrect file type is sent to an endpoint (e.g., an image file to the /pdf/
endpoint).
{ "error": "File processing failed" }
Usage Limit Exceeded
Returned when the monthly page limit for the API key has been exceeded.
{ "error": "Monthly page limit exceeded." }
File Identification Error (500 Internal Server Error)
A server-side error returned when the backend fails to identify a file's content type (e.g., sending a non-image file to the /image/
endpoint).
{ "status": "error", "message": "Processing failed: cannot identify image file <_io.BytesIO object at 0x...>" }
Processing Workflow
- API Key Validation - Verify authentication credentials
- File Reading & Format Check - Validate PDF file integrity
- Page Count & Quota Check - Verify available credits
- Concurrent Page Processing - Asynchronous AI parsing
- Data Aggregation - Compile results in JSON/DataFrame format
- S3 Upload - Store results if enabled
Quota Management
Metric | Description |
---|---|
credits |
Pages remaining in current billing cycle |
monthly_page_processed |
Total pages processed this month |
storage_used |
Bytes stored in S3 storage |
Security & Encryption
- Unique API keys per organization
- Sanitized file handling and validation
- End-to-end encryption for extracted data
- Built-in quota and abuse protection
Sample Output
{ "image": "<base64-Image>", "analysis": { "Invoice Number": "INV-2034", "Total": "1,250.00", "Date": "2024-01-15", "Vendor": "Acme Corp" } }