Skip to main content

Overview

This Guidance provides best practices for building and deploying an intelligent document processing (IDP) architecture that scales with workload demands. The code provided automates the creation of machine learning (ML) resources which will reduce developer friction associated with the time-consuming and error-prone tasks of standing up a high-quality IDP environment. This will reduce the time it takes to deliver a proof-of-concept for IDP workflows and help ensure adherence to architectural best practices.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

This Guidance comes with a Git repository that contains all the artifacts required to deploy the architecture. 

Read the Operational Excellence whitepaper

Amazon S3 encrypts your data by default using Amazon S3-managed encryption keys. You may also use AWS Key Management Service (AWS KMS), a managed service that allows you to use your own cryptographic keys to protect your data. Data shared between services in your account never leaves your account. You can use the Amazon Comprehend console or APIs to detect personally identifiable information (PII) in English text documents. With PII detection, you have the choice of locating the PII entities or redacting the PII entities in the text.

Read the Security whitepaper

The Guidance recommends and has separate AWS CDK components for each Lambda function that can be used as microservices. The serverless, event-driven architecture in addition to retry and exponential back off features make this architecture scalable. The Lambda functions included in the sample code have logging enabled, set with the default mode of "DEBUG.” You can view these logs in Amazon CloudWatch, through which you can also monitor and set alarms for specific log events. 

Read the Reliability whitepaper

The Guidance deploys a serverless event-driven architecture that scales according to traffic patterns.

Read the Performance Efficiency whitepaper

This Guidance and the associated workshop use AWS Cloud9 to create instances to install Docker and deploy the AWS CDK stacks. We recommend using the cost-saving setting that prompts the environment to auto-hibernate after thirty minutes of no activity. The Step Functions workflow is initiated only when the document is uploaded to a particular Amazon S3 location. The workshop contains an estimate on total cost of execution and has a clean-up section to destroy the deployed stack.

Read the Cost Optimization whitepaper

This Guidance allows you to maximize your utilization and right-size your implementation by using Step Functions, which only runs when your documents are being processed. This allows you to use resources only when needed and conserve energy consumption of the underlying infrastructure. By using managed services like AWS Textract and Amazon Comprehend, you can operate at scale and share the underlying resources, which allows you to further maximize resource usage.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.