Automate data processing from documents

Improve employee productivity and make faster decisions with intelligent document processing

Organizations across industries, including financial services and healthcare, have a large number of documents that require processing of some kind. These documents, such as invoices, patient forms, loan applications, and contracts, contain data like applicant names, entities (places or brands), or patient health history, which is essential to their business processes.

All of this data needs to be extracted from digital documents to perform tasks like process loan applications, analyze customer sentiment, determine patient treatments, or filter out non-compliant purchases from invoices. Today, organizations spend millions each year on manual efforts to do this, which are time-consuming, error-prone, expensive, and do not scale easily.

To help overcome these challenges, AWS offers intelligent document processing solutions, powered by machine learning. You can extract text from millions of documents, understand the sentiment of or relationships between those documents, and even include a human step to validate, correct or augment the machine learning results for higher accuracy and compliance.

Automate Data Extraction and Analysis from Documents with Machine Learning (2:41)


Deliver better personalized experiences

Higher accuracy of data

Using ML can help you process documents faster and more accurately, reducing errors caused from manual entry. In cases where data needs to be 100% accurate, you can have a human step in at any time and review data.

Increase customer engagement

Faster data processing

Implementing intelligent document processing can help you accomplish weeks or months of work in a matter of days.

Personalize every touchpoint

Improved employee productivity

Machine learning removes the manual process of pulling out insights from documents and entering information into various systems, enabling your employees to spend more time on value-adding business tasks.

Personalize every touchpoint

Cost savings

Automating document workflows reduces the complexity of data extraction and
analysis, reducing the average cost per document.

Customer stories

"We strive to combine technology and expertise to help our customers understand their supply chain data. In order to do this we needed a way to provide real-time classification of free-form compliance documents at scale. Our process is to extract semi-structured text from images and PDF's with forms and tables as well as extract custom entities within those documents. Amazon Textract's OCR technology enabled us to process the documents while Amazon Comprehend was able to extract custom entities. We also had the need to incorporate humans in our process using Amazon Augmented AI (Amazon A2I) we were able to have our teams review documents in a given accuracy range and help train our next model iteration. Combining these services along with AppSync and Amplify provided us more accurate insights into our customers supply chain risk in a shorter period of time saving our customers hundreds of hours in manually reviewing documents. They can now get immediate feedback on whether their company is at compliance risk."

Corey Peters, Senior Software Developer - Assent Compliance

Lotte Mart
“For over 25 years we have been developing advanced machine learning capabilities to mine, connect, enhance, organize and deliver information to our customers, successfully allowing them to simplify and derive more value from their work. Working with Amazon SageMaker enabled us to design a natural language processing capability in the context of a question answering application. Our solution required several iterations of deep learning configurations at scale using Amazon SageMaker's capabilities.”

Khalid Al-Kofahi, AI and Cognitive Computing - Thomson Reuters Center

Lotte Mart
"Amazon Textract helped us support 80% of PPP applicants to receive a fully automated lending experience and reduced approval times from multiple days to a median speed of 4 hours. By the end of the program, we became the second largest PPP lender in the nation by application volume, surpassing major US banks —serving over 297,000 small businesses, and preserving an estimated 945,000 jobs across America."

Anthony Sabelli, Head of Data Science - Kabbage

Lotte Mart
“Our teams process, and verify a massive volume of financial documents annually in order to provide loans, and leases to our customers. In some cases, required funding documents can be inconsistent or poorly scanned. Using Amazon Augmented AI (A2I) and Amazon Textract, we are able to reduce the amount of time spent reviewing documents by up to 80%,” said Matthew Lewis CTO at Dealnet Capital. “The ability to audit the accuracy of text extracted from all of our financial documents at scale using human review workflows with A2I gives us higher confidence that our machine-learning-powered systems are delivering the highest quality possible to meet our rigorous compliance, and document verification standards.”

Matthew Lewis, CTO - Dealnet Capital

Choose the right solution for your needs

AWS offers several flexible approaches you can use to implement a machine learning-based intelligent document processing solution to automatically extract, process, and analyze data from the documents that drive your business. For organizations who are looking to get started with pre-trained intelligent document processing solutions today, AWS offers fully managed services such as Amazon Textract, Amazon Comprehend, and Amazon A2I. Combined together or used separately, these AWS Services can provide a powerful way to reduce cost and manual effort, and improve your business outcomes. Organizations that want to develop their own machine learning models for intelligent document processing can use Amazon SageMaker, a fully managed service that helps data scientists and ML developers build, train, and deploy DIY machine learning models quickly. Regardless of which option you choose, Amazon SageMaker provides all the tools you need for machine learning end-to-end so you can easily develop high quality text processing models.

Amazon Textract

Amazon Textract is a fully managed machine learning service that automatically extracts handwriting, printed text, and data from scanned documents. This service goes beyond traditional Optical Character Recognition (OCR) technology, which requires manual configuration that needs to be updated each time a form is changed, by accurately extracting text, forms, tables, and other data without the need for any manual effort or custom code. With Textract you can quickly automate manual document activities, enabling you to process millions of document pages in hours. Once the information is captured, you can take action on it within your business applications to initiate next steps for a loan application, tax document, enrollment form or medical claims processing.

Learn more about Amazon Textract » 

Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. The service identifies the language of the text, extracts key phrases, places, people, or brands, understands the sentiment in text, and automatically organizes a collection of text files by topic. You can train Amazon Comprehend to identify entities relevant to your organization such as product names, part numbers, department names etc. You can also train Amazon Comprehend to categorize documents or assign relevant labels to text.

Learn more about Amazon Comprehend »

Amazon Comprehend Medical

Amazon Comprehend Medical is a natural language processing service that makes it easy to use machine learning to extract relevant medical information from unstructured text, designed specifically for healthcare customers. Using Amazon Comprehend Medical, you can quickly and accurately gather information, such as medical condition, medication, dosage, strength, and frequency from a variety of sources like doctors’ notes, clinical trial reports, and patient health records. Amazon Comprehend Medical can also link the detected information to medical ontologies such as ICD-10-CM or RxNorm so it can be used easily by downstream healthcare applications.

Learn more about Amazon Comprehend Medical »

Amazon A2I

Amazon Augmented AI (Amazon A2I) makes it easy to build and manage human reviews for machine learning applications. Amazon A2I provides built-in human review workflows for common machine learning use cases, such as text extraction from documents. Using Amazon A2I, you can send any document to a human for review to ensure the text, phrase or information is processed correctly. Additionally, you can use this human review information to retrain your machine learning model to provide for accuracy downstream.

Learn more about Amazon A2I »

Amazon SageMaker

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. SageMaker removes the heavy lifting from each step of machine learning to make it easier to develop high quality models. SageMaker provides several built-in machine learning algorithms, such as BlazingText and Linear Learner, that are optimized for text classification, natural language processing (NLP), and optical character recognition (OCR), that you can readily use to train and deploy models. You can also bring your own text processing algorithm or model such as the popular Bidirectional Encoder Representations from Transformers (BERT) to Amazon SageMaker or select from the hundreds of algorithms and pre-trained models available at the AWS Marketplace. Additionally, with SageMaker Autopilot, organizations can use automated machine learning (AutoML) capabilities to generate text processing models easily. With any of these options, SageMaker provides all the components you need for machine learning, including the first fully integrated development environment (IDE) for machine learning so teams can perform develop and share ML models, and easily collaborate across data science teams, all from within a single interface. SageMaker gives organizations complete access, control, and visibility into each step of the ML workflow including continuously monitoring for quality issues and alerting when problems are detected. SageMaker helps teams take ML models to production faster with much less effort and at lower cost, with the ability to continuously improve their models.

Learn more about Amazon SageMaker »


Learn how to overcome document processing and analysis challenges at scale with machine learning

Watch the webinar »

Building an end-to-end intelligent document processing solution using AWS

Read the blog »

Learn more about the Document Understanding (DUS) solution

Check out the GitHub repository »

Ready to get started?

Contact sales
Contact us

Contact Us for more information on machine learning solutions for intelligent document processing

Contact us 
Get started with executing initiatives
Get started on executing your document processing initiatives

The AWS Professional Services organization is a global team of experts can help you realize your desired business outcomes when using the AWS cloud.

Learn more 
Find a partner
Find a Partner

Contact the AWS Partner Network, to work with our global technology and consulting partners

Get started