Amazon Textract

Amazon Textract is a machine learning (ML) service that is designed to use OCR to extract text, handwriting, and data from scanned documents.

General Features

Queries

Amazon Textract is designed to help you utilize the pretrained Queries feature and help with business specific document types while you maintain control and ownership of your data.

Layout

Amazon Textract is designed to allow you to extract layout elements from documents.

Optical Character Recognition

Amazon Textract OCR is designed to detect printed and handwritten text from documents and images.

Form Extraction

Amazon Textract is designed to detect key-value pairs in document images and retain the context. A key-value pair is a set of linked data items.

Table Extraction

Amazon Textract is designed to preserve the composition of data stored in tables during extraction.

Signature Detection

Amazon Textract is designed to detect signatures on a document or image. The location of the signatures and associated confidence scores are designed to be included in the API response.

Query based extraction

Amazon Textract helps you specify the data you need to extract from documents using queries. Amazon Textract is designed to respond to natural language questions and receive the information as part of the API response. Textract Queries are pre-trained on a variety of documents.

Analyze Lending

Analyze Lending API is a managed, preconfigured intelligent document processing API that is designed to extract information from loan packages. Analyze Lending API’s machine learning models are designed to classify and split the mortgage document package by document type.

Invoices and Receipts

Amazon Textract is designed to use ML to understand the context of invoices and receipts and relevant data.

Identity documents

Amazon Textract is designed to use machine learning (ML) to understand the context of identity documents. You are enabled to extract information from an identity document.

Bounding Boxes

All extracted data is returned with bounding box coordinates, which is a polygon frame that encompasses each piece of identified data.

Adjustable Confidence Thresholds

When information is extracted from documents, Amazon Textract is designed to return a confidence score for everything it identifies so that you can make informed decisions about how you want to use the results.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.