What does this AWS Solution do?
The Document Understanding Solution (DUS) delivers an easy-to-use web application that ingests and analyzes files, extracts text from documents, identifies structural data (tables, key value pairs), extracts critical information (entities), and creates smart search indexes from the data. Additionally, files can be uploaded directly to and analyzed files can be accessed from an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account.
This solution uses AWS artificial intelligence (AI) services that address business problems that apply to various industry verticals:
- Search and discovery: Search for information across multiple scanned documents, PDFs, and images
- Compliance: Redact information from documents
- Workflow automation: Easily plugs into your existing upstream and downstream applications
AWS Solution overview
The diagram below presents the architecture you can automatically deploy using the solution's implementation guide and accompanying AWS CloudFormation template.

Document Understanding Solution architecture
The AWS CloudFormation template deploys a static web application hosted on an Amazon S3 bucket and served by an Amazon CloudFront distribution. Users are authenticated using Amazon Cognito. The web application interacts with the backend using an Amazon API Gateway API, supported by an AWS Lambda function. Documents are uploaded using either the web application, or directly to a dedicated Amazon S3 bucket for bulk processing. Document processing is initiated by the API, which triggers a Lambda function to add an entry to an Amazon DynamoDB table. The table triggers a second Lambda function that supervises the processing. The file format of the upload dictates the route for processing. Amazon Textract extracts text and structural information from the files. The extracted text is then passed to Amazon Comprehend and Amazon Comprehend Medical for further analysis.
The resulting analyses are stored in an Amazon S3 bucket and the metadata is stored in a DynamoDB database. Extracted information is used to index the document in Amazon OpenSearch Service and, if enabled, in Amazon Kendra.
Document Understanding Solution
Version 1.0.3
Release date: 11/2021
Author: AWS
Estimated deployment time: 30-60 min
Features
Search and discovery
Leverage AWS AI services
Compliance

Browse our library of AWS Solutions Implementations to get answers to common architectural problems.

Find AWS certified consulting and technology partners to help you get started.

Browse our portfolio of Consulting Offers to get AWS-vetted help with solution deployment.