Posted On: Apr 21, 2022

Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from any document or image. Textract now provides you the flexibility to specify the data you need to extract from documents using the new Queries features within Analyze Document API. You do not need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats. Queries leverages a combination of visual, spatial, and language cues to extract the information you seek with high accuracy.

Traditional OCR solutions struggle to extract data accurately from most unstructured and semi-structured documents because of significant variations in how the data is laid out across multiple versions and formats of these documents. You need to implement custom post processing code or manually review the information extracted from these documents . You also need to parse through the entire OCR output to extract the information you need for your business processes. With Queries, you will be able to specify the information you need in the form of natural language questions (e.g., “What is the customer name”) and receive the exact information (e.g., ”John Doe”) as part of the API response. Queries also lets you assign an alias to each question, making it easy to integrate the output with your downstream systems. Additionally, Queries is pre-trained on a large variety of unstructured, semi-structured, and structured documents. Some example include paystubs, bank statements, W-2s, loan application forms, mortgage notes, vaccine and insurance cards.

To learn more about this new feature you can read a step-by-step blog to get started now or you can view the documentation. Pricing for this new feature is available on Amazon Textract’s pricing page.

Textract’s Analyze Document Queries will be available US East (Ohio), US East (N. Virginia), US West (Northern California),US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore) , Asia Pacific (Sydney), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), AWS GovCloud (US-East), and AWS GovCloud (US-West) starting March 31st, 2022. Click here to get started with Analyze Document Queries.