Amazon Textract

Use Optical Character Recognition (OCR) to extract text from documents

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images or text into machine-encoded text, whether from a scanned document, PDF, or a photo of a document.

Existing OCR technologies are unable to recognize common layouts like forms and tables, and usually generate a lengthy text dump. What organizations want instead is the ability to accurately identify and extract text and data from forms and tables in documents of any format and from a variety of file types and templates.

Using advanced machine learning, Amazon Textract uses OCR technology that goes beyond traditional software by not only identifying each character, word and letter but also the contents of fields in forms and information stored in tables for scanned images, documents and PDF’s.

Use Cases

Financial Services

W-2’s, mortgage applications and many more financial forms all have different formats and help you collect valuable information about your customers. You need this information to perform specific outcomes like loan approvals and tax allocations.

Amazon Textract can help you with your toughest extractions like tables and forms as well as process dense text using Optical Character Recognition (OCR) in minutes. Take all the paperwork and put machine learning to use and cut down processes from days to minutes.



Extract patient data from health intake forms, insurance claims, and pre-authorization forms which have important information locked within them to better serve your patients and insurers.

Textract can scan thousands of healthcare and insurance forms and extract the information from within those forms without continued configuration using Optical Character Recognition. Normal OCR technology provides a data dump of text, Textract can keep your information organized and in its original context saving you time of manually reviewing the output.


Legal contracts and case file documents all have information needed to help provide your clients with information needed to make decisions. Organizations spend thousands of hours manually reviewing these documents and pulling out useful information. This is time consuming and prone to error.

Amazon Textract can extract all the data from these documents whether they are scanned images, PDF’s or scanned documents using Optical Character Recognition by not only identifying each character, word and letter but also the contents of fields in forms and information stored in tables with high accuracy.


"Amazon Textract helped us support 80% of PPP applicants to receive a fully automated lending experience and reduced approval times from multiple days to a median speed of 4 hours. By the end of the program, we became the second largest PPP lender in the nation by application volume, surpassing major US banks —serving over 297,000 small businesses, and preserving an estimated 945,000 jobs across America."

Anthony Sabelli, Head of Data Science for Kabbage

"At Change Healthcare, we believe that we can make healthcare affordable and accessible to all by improving the timeliness and quality of financial and administrative decisions. This can be achieved by the power of machine learning technology to understand more from our data. But unlocking the potential of this information can often be difficult as it's siloed in tables and forms that traditional optical character recognition hasn't been able to analyze. Amazon Textract further advances document understanding with the ability to retrieve structured data in addition to text, and now with the service becoming HIPAA compliant, we'll be able to liberate the information from millions of documents and create even more value for patients, payers, and providers.”

Nick Giannasi, EVP and Chief AI Officer - Change Healthcare

"Millions of matters and case files are handled in Filevine every day. We chose Amazon Web Services because we wanted to deliver best-in-class document search solutions for our customers. Amazon Textract is fast, accurate, and scalable - it helps Filevine meet the exacting requirements of the world’s largest and most sophisticated legal organizations. With Filevine and Amazon, finding the proverbial needle in the haystack has never been easier for legal professionals."

Ryan Anderson, Chief Executive Officer - Filevine


Highly Accurate

Traditional text recognition methods detect and recognize text character by character, however, these methods have limitations – they do not utilize sequential modeling and contextual dependencies between characters. By taking advantage of sequential modeling of characters and the dependence between characters, Textract is able to improve OCR accuracy goes beyond the traditional OCR.

Low Cost

With Amazon Textract, you pay only for what you use. There are no minimum fees and no upfront commitments. Amazon Textract charges you for each page not each character or word whether you are extracting only text, tables and forms – pay for only what you use


Secure and compliant

Textract can be used for workloads that are subject to Service Organization Control (SOC) compliance, and International Organization for Standardization (ISO) compliance as well as PCI, HIPAA, and GPDR which means customers in finance, healthcare, and more can get deep insight into the security processes and controls that protect customer data.


Get Started with Amazon Textract Today