DocExtract Digitization API Documentation

Convert raw PDF into fully editable, structured PDFs and DOCX files using DocExtract’s AI-powered digitisation API.

API v1

Base URL

POST
https://docextract.ai/api/v1/docextract/digitised/pdf/

All API requests should be made to this endpoint using the POST method.

Authentication

Include your API key in every request for authentication. Get Your api keys at docextract.ai/manage-api

api_key = "ABCDE*****"
Field Location Required Description
api_key Form Data Required Unique key issued per organization

Request Parameters

Content-Type: multipart/form-data
Field Type Required Default Description
file File Required - The Image file to process
api_key String Required - Organization's API key
language String Optional Original Select the language of the document. The value must be a single supported language name in lowercase, for example english, hindi, or original.
keep_records Boolean Optional False Whether to store extracted data in DocExtract also return list of image urls
output_format String Optional pdf The format of the final digitised document. Supported values are pdf and docx. If not provided, the default output format is pdf.

Example: cURL Request


curl -X POST "https://docextract.ai/api/v1/docextract/digitised/pdf/" \
-F "file=@invoice.pdf" \
-F "api_key=ABCDE*****" \                        
-F "language=english" \
-F "keep_records=true" \
-F "output_format=pdf" \
-F "output_image_format=base64"
                        

Response Schema

Success Response (200 OK)

Returned upon successful image processing.

{
    "status": "success",
    "pages": 1,
    "credits_remaining": 97620,
    "file_name": "rashan.jpg",
    "file_extension": "jpg",
    "file_size": 55008,
    "processed_url": "https://de-intelliteam.s3.ap-south-1.amazonaws.com/113/733/pdf/8a4b9d11-49c3-425e-9a37-3b3bfcbbdfc1.pdf"
}

Common Error Responses

Invalid API Key (401 Unauthorized)

Returned when the API key fails server-side validation.

{
    "error": "[APIKeyValidationError 401]: Unexpected server error during API key validation."
}

Missing Form Fields (400 Bad Request)

Returned if the file or api_key fields are missing from the request. This can also occur if the file field is present but has no data.

{
    "file": ["This field is required."],
    "api_key": ["This field is required."]
}

Unsupported File Format (400 Bad Request)

Returned when a file with an unsupported extension (e.g., .txt, .docx) is sent.

{
    "error": "Uploaded file is not a PDF."
}

Invalid Language (400 Bad request)

Returned when inavlid language (e.g., hindii, Hindi) is sent.

{
     "language": "Invalid language selection."
}

No Credits left

Returned when there is no Credit Left.

{
    "error": "Not enough credits: PDF pages=1, available credits=0"
}

Usage Limit Exceeded

Returned when the monthly page limit for the API key has been exceeded.

{
    "error": "Monthly page limit exceeded."
}

File Processing Error (500 Internal Server Error)

A server-side error returned when the backend fails to process the file, such as when sending a non-image file (e.g., a PDF) to the /image/ endpoint.

{
    "error": "File processing failed"
}

Processing Workflow

  1. API Key Validation - Verify authentication credentials
  2. File Reading & Format Check - Validate Image file integrity
  3. Page Count & Quota Check - Verify available credits
  4. Concurrent Page Processing - Asynchronous AI parsing
  5. Data Aggregation - Compile results in JSON/DataFrame format
  6. S3 Upload - Store results if enabled

Quota Management

Metric Description
credits Pages remaining in current billing cycle
monthly_page_processed Total pages processed this month
storage_used Bytes stored in S3 storage
processed_url Output file after process
file_size Size of file in KBs

Security & Encryption

  • Unique API keys per organization
  • Sanitized file handling and validation
  • End-to-end encryption for extracted data
  • Built-in quota and abuse protection

Sample Output

{
    "status": "success",
    "pages": 1,
    "credits_remaining": 97620,
    "file_name": "rashan.jpg",
    "file_extension": "jpg",
    "file_size": 55008,
    "processed_url": "https://de-intelliteam.s3.ap-south-1.amazonaws.com/113/733/pdf/8a4b9d11-49c3-425e-9a37-3b3bfcbbdfc1.pdf"
}