Introduction
What is Document AI?
Google Cloud Document AI empowers organizations to unlock the value hidden within their unstructured documents. It leverages pre-trained machine learning models to extract, classify, and structure information from various document types, including invoices, receipts, contracts, and forms. For organizations grappling with an ever-growing volume of unstructured documents, it offers a powerful solution to unlock hidden data and gain valuable insights, driving efficiency and informed decision-making.
Key Features
Automated Information Extraction: Extract text, key-value pairs, entities, and tables from diverse document formats (PDF, scans, images).
Efficient Task Automation: Streamline workflows by automating tedious manual tasks like data entry and document classification.
Customizable Model Development: Train and deploy custom models tailored to specific document types and information needs.
Integrated Cloud Environment: Seamlessly connect Document AI with other Google Cloud services for holistic data processing and analysis.
Enhanced Accuracy and Scalability: Benefit from pre-trained models and fine-tuning capabilities for high-precision document understanding.
Scope of this blog
This blog acts as an in-depth guide on interacting with the Document AI REST API using the curl command-line tool, addressing the potential errors and difficulties and streamlining the process for you when you try it for the first time. I will also cover other aspects and details of Document AI in my future blogs. You can refer to the official Document AI documentation for further details:
Official Document AI GCP documentation
Prerequisites
Curl installed
Google Cloud Platform project with Document AI-enabled
Service account with Document AI User role
Step-by-Step Guide
Authentication
Initialize gcloud CLI:
gcloud init
Obtain an access token:
gcloud auth application-default print-access-token
Save this access token for later use.
Constructing the command
The basic structure remains:
curl -X POST -H "Authorization: Bearer <access_token>" -H "Content-Type: application/json" -d @<input_file.json> -o <output_file.json> <prediction_endpoint>
Remember to replace placeholders with your specific details.
Creating a General Purpose Processor
In the Google Cloud Console, navigate to the Document AI section.
Click on the Explore processors button.
Click on Create Processor and select Form Parser as the processor type.
Give your processor a name and select a region.
Click Create to create the processor.
Once created, go to the Overview tab and copy the Prediction Endpoint for your new processor.
Obtaining the Prediction Endpoint
Open the Google Cloud Console and navigate to the Document AI section.
Select the Processors tab.
Click on the name of the General Purpose Processor you want to use.
In the Overview tab, copy the Prediction Endpoint listed under API details. This is what you need in your curl command.
Input File Structure
The input_file.json format stays the same as follows:
{ "inlineDocument": { "mimeType": "<mime_type>", "content": "<base64_encoded_content>" } }
Ensure matches supported formats (PDF, GIF, TIFF, JPEG, PNG, BMP, WEBP.
Encoding the content securely
Prefer not to use online base64 converters due to potential security risks and processing issues.
Encode content locally using platform-specific methods:
Windows
Powershell
$pdfBytes = [System.IO.File]::ReadAllBytes("<path_to_pdf>")
$base64String = [System.Convert]::ToBase64String($pdfBytes)
echo $base64String > test.txt
Linux/Mac OS
Bash
base64 <path_to_pdf> > test.txt
Extract the base64 string from test.txt and put it in the content field of input_file.json.
Executing the command
Navigate to the directory containing input_file.json and output_file.json.
Run the curl command with your specific details.
The extracted information will be saved in output_file.json.
Troubleshooting
401 Authentication Error
Double-check access token validity and service account's Document AI User role.
Mime Type Not Acceptable
Confirm your document type matches supported formats.
Invalid Content Input
Always use local encoding methods mentioned above instead of using avoiding online converters.
Beyond Basics
Supported features
Explore the Document AI API documentation for details on:
Supported processors and their functionalities.
Extractable information types (e.g., text, entities, tables).
Language support for different processors
Tailor your curl commands to extract specific information based on your processor's capabilities.
Environment Variables
- Consider using environment variables for sensitive information like access tokens to enhance security and manage multiple accounts effectively.
Tools like dotenv can simplify environment variable usage in shell scripts.
Error Handling
Incorporate error-handling mechanisms in your curl commands to gracefully handle potential issues like network failures or API errors.
Utilize curl exit codes and conditionals to provide informative feedback to the user.
Refer to the official Document AI API documentation for a comprehensive understanding of advanced features and functionalities. This guide provides a solid foundation, but continuous exploration of the API opens up wider possibilities for document processing automation. This enhanced guide empowers you with deeper insights and best practices for using the Document AI REST API effectively via curl.
Thanks for reading :)