Overview

The Document Understanding Solution delivers an easy-to-use web application that ingests and analyzes files, extracts text from documents, identifies structural data (tables, key value pairs), extracts critical information (entities), and creates smart search indexes from the data. Additionally, files can be uploaded directly to and analyzed files can be accessed from an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account.
You can upload and process documents in bulk and, optionally, enable Amazon Kendra support for machine learning-based enterprise search.
Benefits

Search for information across multiple scanned documents, PDFs, and images.
Redact information from documents.
Easily plugs into your existing upstream and downstream applications.
Use Amazon Textract to extract text and structural information from the files and then pass to Amazon Comprehend and Amazon Comprehend Medical for deeper analysis.
Technical details

The diagram below presents the architecture you can automatically deploy using the solution's implementation guide and accompanying AWS CloudFormation template.
The AWS CloudFormation template deploys a static web application hosted in an Amazon S3 bucket and served by an Amazon CloudFront distribution.
Step 1
Users are authenticated using Amazon Cognito. The web application interacts with the backend using an Amazon API Gateway API, supported by an AWS Lambda function.
Step 2
Documents are uploaded using either the web application, or directly to a dedicated Amazon S3 bucket for bulk processing.
Step 3a
Document processing is initiated by the API, which initiates a Lambda function to add an entry to an Amazon DynamoDB table. The table initiates a second Lambda function that supervises the processing. The file format of the upload dictates the route for processing.
Step 3b
Amazon Textract extracts text and structural information from the files. The extracted text is then passed to Amazon Comprehend and Amazon Comprehend Medical for further analysis.
Step 4
The resulting analyses are stored in an Amazon S3 bucket and the metadata is stored in a DynamoDB database. Extracted information is used to index the document in Amazon OpenSearch Service and, if activated, in Amazon Kendra.
Related content

In the course, we discuss what AI is and why it is important, and take a brief look at machine learning and deep learning—which are subsets of AI—and describe how Amazon uses AI in its products.
This course introduces Amazon Machine Learning and Artificial Intelligence tools that enable capabilities across frameworks and infrastructure, machine learning platforms, and API-driven services.