Document Understanding Solution

What does this AWS Solution do?

The Document Understanding Solution delivers an easy-to-use web application that ingests and analyzes files, extracts text from documents, identifies structural data (tables, key value pairs), extracts critical information (entities), and creates smart search indexes from the data. Additionally, files can be uploaded directly to and analyzed files can be accessed from an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account.

This solution uses AWS artificial intelligence (AI) services that address business problems that apply to various industry verticals:

  • Search and discovery: Search for information across multiple scanned documents, PDFs, and images
  • Compliance: Redact information from documents
  • Workflow automation: Easily plugs into your existing upstream and downstream applications 


AWS Solution overview

The diagram below presents the architecture you can automatically deploy using the solution's implementation guide and accompanying AWS CloudFormation template.

Document Understanding Solution architecture

The AWS CloudFormation template deploys a static web application hosted in an Amazon S3 bucket and served by an Amazon CloudFront distribution. Users are authenticated using Amazon Cognito. The web application interacts with the backend using an Amazon API Gateway API, supported by an AWS Lambda function. Documents are uploaded using either the web application, or directly to a dedicated Amazon S3 bucket for bulk processing. Document processing is initiated by the API, which initiates a Lambda function to add an entry to an Amazon DynamoDB table. The table initiates a second Lambda function that supervises the processing. The file format of the upload dictates the route for processing. Amazon Textract extracts text and structural information from the files. The extracted text is then passed to Amazon Comprehend and Amazon Comprehend Medical for further analysis.

The resulting analyses are stored in an Amazon S3 bucket and the metadata is stored in a DynamoDB database. Extracted information is used to index the document in Amazon OpenSearch Service and, if activated, in Amazon Kendra.

Document Understanding Solution

Version 1.0.6
Release date: 11/2022
Author: AWS

Estimated deployment time: 30-60 min

Estimated cost  Source code  CloudFormation template 
Use the button below to subscribe to updates for this Solutions Implementation.
Note: To subscribe to RSS updates, you must have an RSS plug-in enabled for the browser you are using.
Did this Solutions Implementation help you?
Provide feedback 


Search and discovery

Search for information across multiple scanned documents, PDFs, and images.

Leverage AWS AI services

Use Amazon Textract to extract text and structural information from the files and then pass to Amazon Comprehend and Amazon Comprehend Medical for deeper analysis.


Redact information from documents.
Build icon
Deploy an AWS Solution yourself

Browse our library of AWS Solutions to get answers to common architectural problems.

Learn more 
Find an APN partner
Find an AWS Partner Solution

Find AWS Partners to help you get started.

Explore icon
Explore Guidance

Find prescriptive architectural diagrams, sample code, and technical content for common use cases.

Learn more