Amazon Textract

Easily extract text and data from virtually any document
Amazon Textract is a service that automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

Many companies today extract data from documents and forms through manual data entry that’s slow and expensive or through simple optical character recognition (OCR) software that requires manual customization or configuration. Rules and workflows for each document and form often need to be hard-coded and updated with each change to the form or when dealing with multiple forms. If the form deviates from the rules, the output is often scrambled and unusable.

Amazon Textract overcomes these challenges by using machine learning to instantly “read” virtually any type of document to accurately extract text and data without the need for any manual effort or custom code. With Textract you can quickly automate document workflows, enabling you to process millions of document pages in hours. Once the information is captured, you can take action on it within your business applications to initiate next steps for a loan application or medical claims processing. Additionally, you can create smart search indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require redaction.

Introducing Amazon Textract (3:04)

Benefits

Extract data quickly & accurately

Amazon Textract makes it easy to quickly and accurately extract data from documents, forms, and tables. Amazon Textract automatically detects a document’s layout and the key elements on the page, understands the data relationships in any embedded forms or tables, and extracts everything with its context intact. This means you can instantly use the extracted data in an application or store it in a database without a lot of complicated code in between.



No code or templates to maintain

Amazon Textract's pre-trained machine learning models eliminate the need to write code for data extraction, because they have already been trained on tens of millions of documents from virtually every industry, including contracts, tax documents, sales orders, enrollment forms, benefit applications, insurance claims, policy documents and many more. You no longer need to maintain code for every document or form you might receive or worry about how page layouts change over time.

Lower document processing costs

Amazon Textract provides OCR and structured data extraction (forms and tables) at very low cost, and you only pay for what you use. There are no upfront commitments or long-term contracts. You can easily process millions of documents using Amazon Textract's text extraction APIs.

Use cases

Create smart search indexes

Extract structured data from documents and create a smart index to allow you to search through millions of financial statements quickly. For example, a mortgage company could use Amazon Textract to process millions of scanned loan applications in a matter of hours and have the extracted data indexed in Amazon Elasticsearch. This would allow them to create search experiences like “search for loan applications where applicant name is John Doe,” or “search contracts where the interest rate is 2 percent.”

Build automated document processing workflows

Amazon Textract can provide the inputs required to automatically process forms without human intervention. For example, banks can automate loan applications using Amazon Textract. The information contained in the document could be used to initiate all of the necessary background and credit checks to approve the loan so that customers can get instant results of their application rather than having to wait several days for manual review and validation.

Maintain compliance in document archives

Because Amazon Textract identifies data types and form labels automatically, it’s easy to maintain compliance with information controls. For example, an insurer could use Amazon Textract to feed a workflow that automatically redacts personally identifiable information (PII) for their review before archiving claim forms by automatically recognizing the important key-value pairs that require protection.

Customer success

600x400-cambia-health-solutions_logo

Cambia Health Solutions is a total health solutions company and the parent company of six regional health plans, including Regence, an insurer serving 2.6 million members in Oregon, Idaho, Utah and Washington.

“Over the past 100 years Cambia has been dedicated to improving health care for people and their families. To help us achieve that goal, we’re always evaluating new innovations and opportunities to optimize care coordination. One area of focus is streamlining administrative processes that are time and labor intensive. We’re excited to explore Amazon Textract to help us automate the process of extracting valuable data from paper forms accurately and efficiently. The powerful combination of data science, A.I., and a person-focused approach is key to our mission of transforming the health care system.”

Faraz Shafiq, Chief Artificial Intelligence Officer - Cambia Health Solutions


Change Healthcare_red_blue_logo_CMYK-01

Change Healthcare is a leading independent healthcare technology company that provides data and analytics-driven solutions to improve clinical, financial and patient engagement outcomes in the U.S. healthcare system.

"At Change Healthcare, we believe that we can make healthcare affordable and accessible to all by improving the timeliness and quality of financial and administrative decisions. This can be achieved by the power of machine learning technology to understand more from our data. But unlocking the potential of this information can often be difficult as it's siloed in tables and forms that traditional optical character recognition hasn't been able to analyze. Amazon Textract further advances document understanding with the ability to retrieve structured data in addition to text, and now with the service becoming HIPAA compliant, we'll be able to liberate the information from millions of documents and create even more value for patients, payers, and providers.”

Nick Giannasi, EVP and Chief AI Officer - Change Healthcare


CD_2018_Primary_Logo_w_TM

ClearDATA’s innovative platform of solutions and services protects customers from data privacy risks, improves their data management, and scales their healthcare IT infrastructure, enabling the industry to focus on making healthcare better by improving healthcare delivery, every single day.

“It’s exciting to see AWS add their optical character recognition service powered by machine learning, Textract, to their list of HIPAA eligible services. A lot of medical data that is shared among payers and providers is locked in image-based files like PDFs. Instead of manually processing that kind of data, healthcare organizations can now use Amazon Textract service to extract medical data from files that previously have been non-machine readable. This brings an opportunity to integrate this data with their electronic health records, or other cloud technologies like Amazon Comprehend Medical which can identify protected health information in the dataset. This is just another step forward in increasing the opportunity to use these emerging technologies to improve access to data, get better insights, lower costs, and improve patient and member experiences.”

Matt Ferrari, Chief Technology Officer - ClearDATA

Product-Page_Standard-Icons_01_Product-Features_SqInk
Check out Amazon Textract features

Discover more Amazon Textract features.

Learn more 
Product-Page_Standard-Icons_02_Sign-Up_SqInk
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Product-Page_Standard-Icons_03_Start-Building_SqInk
Start building in the console

Get started building with Amazon Textract in the AWS Management Console.

Sign up