Optical Character Recognition (OCR)

Amazon Textract uses Optical Character Recognition (OCR) technology to automatically detect printed text and numbers in a scan or rendering of a document, such as a legal document or a scan of a book. 

Learn more >>

Optical Character Recognition (OCR)

Form Extraction

Amazon Textract enables you to detect key-value pairs in document images automatically so that you can retain the inherent context of the document without any manual intervention. A key-value pair is a set of linked data items. For instance, on a document the field “First Name” would be the key and “Jane” would be the value. This makes it easy to import the extracted data into a database or to provide it as a variable into an application. With traditional OCR solutions, keys and values are extracted as simple text. The relationship between them is lost unless hard-coded rules are written and maintained for each form. 

Learn more >>

Key-Value Pair Extraction

Table Extraction

Amazon Textract preserves the composition of data stored in tables during extraction. This is helpful for documents that are largely composed of structured data, such as financial reports or medical records that have column names in the top row of the table followed by rows of individual entries. You can use this feature to automatically load the extracted data into a database using a pre-defined schema. For example, rows of item numbers and quantities in an inventory report will retain their association to easily increment item totals in an inventory management application.

Learn more >>

Table Extraction

Bounding Boxes

All extracted data is returned with bounding box coordinates, which is a polygon frame that encompasses each piece of identified data, such as a single word, a line, a table, or even individual cells within a table. This is helpful for being able to audit where a word or number came from in the source document or to guide the user in document search systems that return scans of original documents as the search result. For example, when searching medical records for patient history details, users can easily make note of the source document and quickly take note for future searches.

Learn more >>

Adjustable Confidence Thresholds

When information is extracted from documents, Amazon Textract returns a confidence score for everything it identifies so that you can make informed decisions about how you want to use the results. For instance, if you are extracting information from tax documents and want to ensure high accuracy, then you can create business logic to flag any extracted information with a confidence score lower than 95% to be reviewed by a human. However, you may choose a lower threshold for other types of documents where the consequences of an error have little to no negative consequences like processing resumes or digitizing archived documents.

Learn more >>

Product-Page_Standard-Icons_01_Product-Features_SqInk
Learn more about Amazon Textract pricing

Get started with Amazon Textract with no upfront commitments or long-term contracts.

Learn more 
Product-Page_Standard-Icons_02_Sign-Up_SqInk
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Product-Page_Standard-Icons_03_Start-Building_SqInk
Start building in the console

Get started building with Amazon Textract in the AWS Management Console.

Sign up