Amazon Textract

Easily extract text and data from virtually any document
Amazon Textract is a service that automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

Many companies today extract data from documents and forms through manual data entry that’s slow and expensive or through simple optical character recognition (OCR) software that requires manual customization or configuration. Rules and workflows for each document and form often need to be hard-coded and updated with each change to the form or when dealing with multiple forms. If the form deviates from the rules, the output is often scrambled and unusable.

Amazon Textract overcomes these challenges by using machine learning to instantly “read” virtually any type of document to accurately extract text and data without the need for any manual effort or custom code. With Textract you can quickly automate document workflows, enabling you to process millions of document pages in hours. Once the information is captured, you can take action on it within your business applications to initiate next steps for a loan application or medical claims processing. Additionally, you can create smart search indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require redaction.

Introducing Amazon Textract (3:04)

Benefits

Extract data quickly & accurately

Amazon Textract makes it easy to quickly and accurately extract data from documents, forms, and tables. Amazon Textract automatically detects a document’s layout and the key elements on the page, understands the data relationships in any embedded forms or tables, and extracts everything with its context intact. This means you can instantly use the extracted data in an application or store it in a database without a lot of complicated code in between.



No code or templates to maintain

Amazon Textract's pre-trained machine learning models eliminate the need to write code for data extraction, because they have already been trained on tens of millions of documents from virtually every industry, including contracts, tax documents, sales orders, enrollment forms, benefit applications, insurance claims, policy documents and many more. You no longer need to maintain code for every document or form you might receive or worry about how page layouts change over time.

Lower document processing costs

Amazon Textract provides OCR and structured data extraction (forms and tables) at very low cost, and you only pay for what you use. There are no upfront commitments or long-term contracts. You can easily process millions of documents using Amazon Textract's text extraction APIs.

Use cases

Create smart search indexes

Extract structured data from documents and create a smart index to allow you to search through millions of financial statements quickly. For example, a mortgage company could use Amazon Textract to process millions of scanned loan applications in a matter of hours and have the extracted data indexed in Amazon Elasticsearch. This would allow them to create search experiences like “search for loan applications where applicant name is John Doe,” or “search contracts where the interest rate is 2 percent.”

Build automated document processing workflows

Amazon Textract can provide the inputs required to automatically process forms without human intervention. For example, banks can automate loan applications using Amazon Textract. The information contained in the document could be used to initiate all of the necessary background and credit checks to approve the loan so that customers can get instant results of their application rather than having to wait several days for manual review and validation.

Maintain compliance in document archives

Because Amazon Textract identifies data types and form labels automatically, it’s easy to maintain compliance with information controls. For example, an insurer could use Amazon Textract to feed a workflow that automatically redacts personally identifiable information (PII) for their review before archiving claim forms by automatically recognizing the important key-value pairs that require protection.

Customer success

Cox Auto_logo
“At Cox Automotive, we are looking to transform the way the world buys, sells, owns and uses cars. To further modernize our automotive solutions, we will be leveraging Amazon Textract to accelerate how quickly cars can be transacted. With Amazon Textract, we can automatically capture and validate data from documents and forms, such as loan applications or vehicle titles, so decisions can be made more quickly. This will reduce customer effort and further streamline the process for everyone involved from the manufacturer to the buyer.”

Bryan Landerman, Chief Technology Officer - Cox Automotive


Healthfirst

Healthfirst is a not-for-profit managed care organization and one of the fastest growing health plans in New York with over 1.4M diverse members and a network of more than 35,000 providers and 4,500 employees.

“At Healthfirst, we are building data pipelines to turn scanned medical charts into useful clinical information to improve care coordination, drive quality outcomes, and ensure appropriate reimbursement for members under our coverage. We use Amazon Textract and Amazon Comprehend Medical to glean real value from unstructured data sources in an efficient way, resulting in revenue savings 10-20 times more than our usual downstream operation. By scaling up to analyze over 50,000 charts, we can find undocumented diagnoses and refer around 5,000 members for the care management they need.”

Steve Prewitt, Chief Analytics Officer - Healthfirst


met-office-logo

The Met Office is the UK’s national weather service, and is a world leader in providing weather and climate services.

"We hope to use Textract to digitize millions of historical weather observations from document archives. Making these observations available to science will improve our understanding of climate variability and change."

Philip Brohan, Climate Scientist - Met Office


the-globe-and-mail-logo

The Globe and Mail is a national icon and Canada’s most recognized media brand.

"As a news media company, we rely on many PDF or scanned-source documents such as FOIs (freedom of information requests) that have important information contained in tables that we previously couldn't access. These documents have been under-utilized because journalists were not able to access them easily or didn't know they existed. Using Amazon Textract, we are able to extract information from tables in PDFs and easily output that data to CSV and offer easy access to these documents by making them available for search queries by our journalists. This increases efficient access to information for our journalist by tenfold."

Michael O’Neill, Managing Director, Digital and Data Science - The Globe and Mail


Roche
"Roche's NAVIFY decision support portfolio provides solutions that accelerate research and enable personalized healthcare. With petabytes of medical PDF documents being generated in hospital systems every day, we needed a document extraction service to handle documents that have no standardized format. Amazon Textract provides the functionality to help us extract text from medical documents, so that we can then apply Natural Language Processing (NLP) to build a comprehensive, longitudinal view of patients, and enable both decision support and population analytics."

Ram Balasubramanian, Sr. Director of Software Engineering - Roche Diagnostics Information Solutions

Product-Page_Standard-Icons_01_Product-Features_SqInk
Check out Amazon Textract features

Discover more Amazon Textract features.

Learn more 
Product-Page_Standard-Icons_02_Sign-Up_SqInk
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Product-Page_Standard-Icons_03_Start-Building_SqInk
Start building in the console

Get started building with Amazon Textract in the AWS Management Console.

Sign up