This Guidance provides best practices for building and deploying an intelligent document processing (IDP) architecture that scales with workload demands. The code provided automates the creation of machine learning (ML) resources which will reduce developer friction associated with the time-consuming and error-prone tasks of standing up a high-quality IDP environment. This will reduce the time it takes to deliver a proof-of-concept for IDP workflows and help ensure adherence to architectural best practices.

Please note: [Disclaimer]

Architecture Diagram

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • This Guidance comes with a Git repository that contains all the artifacts required to deploy the architecture. 

    Read the Operational Excellence whitepaper 
  • Amazon S3 encrypts your data by default using Amazon S3-managed encryption keys. You may also use AWS Key Management Service (AWS KMS), a managed service that allows you to use your own cryptographic keys to protect your data. Data shared between services in your account never leaves your account. You can use the Amazon Comprehend console or APIs to detect personally identifiable information (PII) in English text documents. With PII detection, you have the choice of locating the PII entities or redacting the PII entities in the text.

    Read the Security whitepaper 
  • The Guidance recommends and has separate AWS CDK components for each Lambda function that can be used as microservices. The serverless, event-driven architecture in addition to retry and exponential back off features make this architecture scalable. The Lambda functions included in the sample code have logging enabled, set with the default mode of "DEBUG.” You can view these logs in Amazon CloudWatch, through which you can also monitor and set alarms for specific log events. 

    Read the Reliability whitepaper 
  • The Guidance deploys a serverless event-driven architecture that scales according to traffic patterns.

    Read the Performance Efficiency whitepaper 
  • This Guidance and the associated workshop use AWS Cloud9 to create instances to install Docker and deploy the AWS CDK stacks. We recommend using the cost-saving setting that prompts the environment to auto-hibernate after thirty minutes of no activity. The Step Functions workflow is initiated only when the document is uploaded to a particular Amazon S3 location. The workshop contains an estimate on total cost of execution and has a clean-up section to destroy the deployed stack.

    Read the Cost Optimization whitepaper 
  • This Guidance allows you to maximize your utilization and right-size your implementation by using Step Functions, which only runs when your documents are being processed. This allows you to use resources only when needed and conserve energy consumption of the underlying infrastructure. By using managed services like AWS Textract and Amazon Comprehend, you can operate at scale and share the underlying resources, which allows you to further maximize resource usage.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

AWS Machine Learning
Workshop

Use machine learning to automate and process documents at scale

This workshop demonstrates how to setup processing documents at AWS at scale and customize them for you extraction requirements.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?