Reinventing your document processing with optical character recognition (OCR)
This Guidance demonstrates how to facilitate Intelligent Document Processing to accelerate your business processes and reduce the overall costs associated with your document workflows. The process begins with documents uploaded to a storage bucket, triggering an asynchronous Amazon Textract detection job. The extracted text is then classified and enriched using artificial intelligence and machine learning (AI/ML) technology, with the results stored in the storage bucket. Automated validation and review steps are performed next, with human review facilitated through Amazon Augmented AI (A2I) when necessary. Finally, the verified data is stored in a fully managed NoSQL database service where it is readily available for downstream applications.
Architecture Diagram
[Architecture diagram description]
Step 1
Documents are uploaded to Amazon Simple Storage Service (Amazon S3) which invokes an AWS Lambda function.
Step 2
The Lambda function starts an Amazon Textract asynchronous detection job.
Step 3
Amazon Textract sends a completion notification to Amazon Simple Notification Service (Amazon SNS). The Amazon SNS topic sends the completion message to the Amazon Simple Queue Service (Amazon SQS) queue.
Step 4
A Lambda function is invoked by the Amazon SQS queue to process and read the Amazon Textract output.
Step 5
The Lambda function calls Amazon Bedrock with a classification prompt containing the document text and instructions on how to classify the document.
Step 6
The Lambda function saves raw optical character recognition (OCR) text along with the result of the classification prompt to Amazon S3.
Step 7
The Amazon S3 bucket with the classified document invokes a Lambda function to process the document content according to the classification.
Step 8
The Lambda function calls Amazon Bedrock with enrichment prompts containing the document text and instructions on how to enrich the content and normalize the content.
Step 9
The data from Amazon Bedrock and any enriched documents are saved in an Amazon S3 bucket location.
Step 10
A Lambda function is invoked from the Amazon S3 bucket. The function runs review and validation on the data using predefined rules. It also checks accuracy scores and sends the information for human review if threshold scores are not met.
Step 11
A human completes the review and uses Amazon Augmented AI (Amazon A2I) to update the appropriate information in to the Amazon S3 location, which initiates another validation using the Lambda function.
Step 12
A Lambda function stores the extracted and verified data in an Amazon DynamoDB table.
Step 13
Lambda sends a notification that all rules were verified correctly or if any information needs further human review.
Get Started
Deploy this Guidance
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
The Intelligent Document Processing architecture can be fully deployed using infrastructure as code (IaC) methodologies. The serverless infrastructure components can be provisioned using the AWS Cloud Development Kit (CDK) found below and orchestrated through the low-code visual workflow service, AWS Step Functions. This automation can be seamlessly integrated into your development pipeline, enabling rapid iteration and consistent deployments. Observability for this Guidance is achieved through the use of Amazon CloudWatch logs, which capture telemetry data from the AWS AI services employed, such as Amazon Textract and Amazon Comprehend.
-
Security
The AI services in this Guidance support security for both resting and transitional data. Amazon Textract, Amazon Comprehend, and Amazon Comprehend Medical support encryption at rest with Amazon S3 buckets and AWS Key Management Service (AWS KMS). In addition, Amazon Textract provides an asynchronous API and Amazon Comprehend Medical services support in-memory data processing.
In addition, Intelligent Document Processing can be orchestrated with a serverless backend with AWS Identity and Access Management (IAM) for authentication and secure validation. You can also define separation of access control per user role. For example, you can give the owner full access to all documents, but allow an operator to access only de-identified documents.
Finally, this architecture includes the capability to categorize documents accurately by using Amazon Comprehend to detect personally identifiable information (PII). Also, when you want to detect Protected Health Information (PHI), use Amazon Comprehend Medical PHI identification and redaction options to scan clinical text.
-
Reliability
The Intelligent Document Processing architecture uses managed, Regional AI services provided by AWS. The reliability and availability of these services within the selected AWS Region are maintained by AWS. The inherent nature of the managed AI services helps ensure resilience to failures and high availability. Should you choose to use Amazon S3 as the scalable data store, it is recommended to consider enabling Amazon S3 cross-Region replication. This additional measure can further increase the reliability of this and provide access to disaster recovery options.
-
Performance Efficiency
The serverless and event-driven architecture of this Guidance promotes efficiency, as resources are not wasted when documents are not being processed. It can be scaled in a particular Region to accommodate for large scale document processing, achieved by increasing the call rates for the AI services and Lambda. You can also design a serverless decoupled architecture with Amazon SNS and Amazon SQS for concurrent processing of multiple documents. Lastly, Intelligent Document Processing can be configured to operate in real-time with response times in seconds, or in asynchronous mode, depending on your specific requirements.
-
Cost Optimization
Intelligent Document Processing minimizes costs by using a serverless, event-driven architecture, where you only pay for the time and resources consumed during document processing. Amazon Comprehend offers options to train custom models in addition to utilizing pre-defined entity extraction capabilities. For urgent, real-time document processing requirements, the Amazon Comprehend resource endpoints can be used for custom models. However, if your use case can accommodate asynchronous or batch processing, it is recommended to use asynchronous jobs for Amazon Comprehend custom models to optimize costs.
-
Sustainability
By extensively using managed services and dynamic scaling capabilities, the environmental impact of the backend infrastructure supporting this Guidance is minimized. AWS managed services handle the provisioning, scaling, and maintenance of the underlying compute, storage, and networking resources, offloading the operational overhead you and your team. Additionally, the dynamic scaling capabilities inherent in managed services and serverless architectures helps ensure that resources are provisioned and utilized only when needed to process incoming workloads, preventing over-provisioning and optimizing the environmental footprint of the backend services powering this Guidance.
Related Content
Intelligent document processing with AWS AI services: Part 1
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.