What does this AWS Solutions Implementation do?

The Document Understanding Solution (DUS) delivers an easy-to-use web application that ingests and analyzes files, extracts text from documents, identifies structural data (tables, key value pairs), extracts critical information (entities), and creates smart search indexes from the data. Additionally, files can be uploaded directly to and analyzed files can be accessed from an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account.

This solution uses AWS artificial intelligence (AI) services that address business problems that apply to various industry verticals:

  • Search and discovery: Search for information across multiple scanned documents, PDFs, and images
  • Compliance: Redact information from documents
  • Workflow automation: Easily plugs into your existing upstream and downstream applications 

 

AWS Solutions Implementation overview

The diagram below presents the architecture you can automatically deploy using the solution's implementation guide and accompanying AWS CloudFormation template.

Document Understanding Solution | Architecture Diagram

Document Understanding Solution architecture

The AWS CloudFormation template deploys a static web application hosted on an Amazon S3 bucket and served by an Amazon CloudFront distribution. Users are authenticated using Amazon Cognito. The web application interacts with the backend using an Amazon API Gateway API, supported by an AWS Lambda function. Documents are uploaded using either the web application, or directly to a dedicated Amazon S3 bucket for bulk processing. Document processing is initiated by the API, which triggers a Lambda function to add an entry to an Amazon DynamoDB table. The table triggers a second Lambda function that supervises the processing. The file format of the upload dictates the route for processing. Amazon Textract extracts text and structural information from the files. The extracted text is then passed to Amazon Comprehend and Amazon Comprehend Medical for further analysis.

The resulting analyses are stored in an Amazon S3 bucket and the metadata is stored in a DynamoDB database. Extracted information is used to index the document in Amazon Elasticsearch Service (Amazon ES) and, if enabled, in Amazon Kendra.

Document Understanding Solution

Version 1.0.2
Release date: 04/2021
Author: AWS

Estimated deployment time: 30-60 min

Source Code  CloudFormation template 
Use the button below to subscribe to updates for this Solutions Implementation.
Note: To subscribe to RSS updates, you must have an RSS plug-in enabled for the browser you are using.
Did this Solutions Implementation help you?
Provide feedback 

Features

Search and discovery

Search for information across multiple scanned documents, PDFs, and images.

Leverage AWS AI services

Use Amazon Textract to extract text and structural information from the files and then pass to Amazon Comprehend and Amazon Comprehend Medical for deeper analysis.

Compliance

Redact information from documents.
Build icon
Deploy a Solution yourself

Browse our library of AWS Solutions Implementations to get answers to common architectural problems.

Learn more 
Find an APN partner
Find an APN Partner

Find AWS certified consulting and technology partners to help you get started.

Learn more 
Explore icon
Explore Solutions Consulting Offers

Browse our portfolio of Consulting Offers to get AWS-vetted help with solution deployment.

Learn more