AWS Machine Learning Blog

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS.

Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. By using the Framework, you will learn operational and architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable workloads in the cloud.

An IDP pipeline usually combines optical character recognition (OCR) and natural language processing (NLP) to read and understand a document and extract specific terms or words. The IDP Well-Architected Custom Lens outlines the steps for an AWS Well-Architected review, which allows you to evaluate and identify technical risks within your IDP workloads. This custom lens integrates best practices and guidance to effectively navigate and overcome common challenges in the management of IDP workloads.

This post focuses on the Operational Excellence pillar of the IDP solution. Operational excellence in IDP means applying the principles of robust software development and maintaining a high-quality customer experience to the field of document processing, while consistently meeting or surpassing service level agreements (SLAs). It involves organizing teams effectively, designing IDP systems to handle workloads efficiently, operating these systems at scale, and continuously evolving them to meet customer needs.

In this post, we start with the introduction of the Operational Excellence pillar and design principles, and then deep dive into four focus areas: organizational culture, workload design, build and release optimization, and observability. By reading this post, you will learn about the Operational Excellence pillar in the Well-Architected Framework with the IDP case study.

Design principles

For IDP workloads, operational excellence translates to the following:

  • High accuracy and low error rates in document data extraction – Precision in extracting data from documents is paramount, which minimizes errors and ensures that the information used for decision-making is trustworthy
  • Fast processing of high document volumes with low latency – Efficiency in handling large volumes of documents swiftly allows organizations to keep pace with business demands, reducing bottlenecks
  • Continuous monitoring for swift diagnosis and resolution of issues – Proactive monitoring and maintenance help in quickly identifying and resolving any interruptions in the document processing pipeline, maintaining a smooth operational flow
  • Rapid iteration to improve models and workflows – Implementing a feedback loop that facilitates constant refinement of algorithms and processes ensures the system evolves to meet emerging challenges and efficiency standards
  • Cost optimization to ensure resources align with workload demands – Strategic resource management ensures that financial investment into IDP systems yields maximum value, adjusting resources dynamically in line with fluctuating document processing demands
  • Adherence to SLAs – Meeting or exceeding the standards and turnaround times promised to customers is crucial for maintaining trust and satisfaction

Effective design strategies must be aligned with these objectives, ensuring that the IDP systems are not only technically capable but also optimized for real-world challenges. This elevates operational excellence from a backend goal to a strategic asset, one that is integral to the success of the entire enterprise. Based on the design principles of the Operational Excellence pillar, we propose the following design principles for this custom lens.

Design Principles Description
Align IDP SLAs with Overall Document Workflow Objectives IDP typically functions as an integral component of the broader document workflow managed by business teams. Therefore, it is essential that the SLAs for IDP are carefully crafted as subsets of the overall document workflow SLAs. This approach ensures that the IDP’s performance expectations are in harmony with the larger workflow objectives, providing a clear and consistent standard for processing speed, accuracy, and reliability. By doing so, businesses can create a cohesive and efficient document management system that aligns with the overarching business goals and stakeholder expectations, fostering trust and dependability in the system’s capabilities.
Codify Operations for Efficiency and Reproducibility By performing operations as code and incorporating automated deployment methodologies, organizations can achieve scalable, repeatable, and consistent processes. This not only minimizes the potential for human error but also paves the way for seamless integration of new data sources and processing techniques.
Proactively Anticipate and Plan for System Failures Because IDP systems process vast array of documents with varied complexities, potential issues can emerge at any stage of the document processing pipeline. You should conduct “pre-mortem” exercises to pre-emptively identify potential sources of failure so that they can be removed or mitigated. Regularly simulate failure scenarios and validate your understanding of their impact. Test your response procedures to ensure they are effective and that teams are familiar with their process. Set up regular game days to test workload and team responses to simulated events.
Iterate Frequently with Feedback Mechanisms As your document processing workload evolves, ensure your operational strategies adapt in sync and look for opportunities to improve them:

  • Make frequent, small, reversible changes – Design workloads to allow components to be updated regularly to increase the flow of beneficial changes into your workload. Make changes in small increments that can be reversed if they fail to aid in the identification and resolution of issues introduced to your environment.
  • Learn from all operational failures – Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.
Monitor Operational Health Ensure a shift from mere monitoring to advanced observability within your IDP framework. This entails a comprehensive understanding of the system’s health. By effectively collecting and correlating telemetry data, you can glean actionable insights, facilitating pre-emptive detection and mitigation of issues.
Pursue Metrics-Driven Quality and Continuous Improvement In IDP, what gets measured gets improved. Define and track key metrics related to document accuracy, processing times, and model efficacy. It is crucial to pursue a metrics-driven strategy that emphasizes the quality of data extraction at the field level, particularly for high-impact fields. Harness a flywheel approach, wherein continuous data feedback is utilized to routinely orchestrate and evaluate enhancements to your models and processes.
Integrate Human Oversight for Process Effectiveness Although automation and ML algorithms significantly advance the efficiency of IDP, there are scenarios where human reviewers can augment and enhance the outcomes, especially in situations with regulatory demands or when encountering low-quality scans. Human oversight based on confidence score thresholds can be a valuable addition.

Focus areas

The design principles and best practices for the Operational Excellence pillar come from what we have learned from our customers and our IDP experts. Use these as a guide when making design choices, making sure they fit well with what your business needs from the IDP solution. Applying the IDP Well-Architected Lens also helps you validate that these choices are aimed at achieving operational excellence, ensuring they meet your specific operational goals.

The following are the key focus areas for operational excellence of IDP solution in the cloud:

  • Organizational culture – Organizational culture is pivotal in shaping how IDP projects are implemented and managed. This culture is sustained by clear SLAs that set definitive expectations for processing times and accuracy, ensuring all team members are oriented towards common goals. This is complemented by a centralized function that acts as the hub for operational excellence, consolidating best practices and steering IDP projects towards success.
  • Workload design – This involves creating a system capable of flexibly handling varying demands, optimizing for quality and accuracy in document processing, and efficiently integrating with external systems.
  • Build and release optimization – This area emphasizes the implementation of standardized DevSecOps processes. The goal is to streamline the development lifecycle and use automation to ensure smooth and rapid deployment of updates or new features. This approach aims to enhance the efficiency, security, and reliability of the IDP system development and deployment.
  • Observability – In IDP, observability is focused on comprehensive monitoring, alerting, and logging capabilities, along with managing service quotas. This involves keeping a vigilant eye on the system’s performance, setting up effective alert mechanisms for potential issues, maintaining detailed logs for analysis, and ensuring the system operates within its resource allocations.

Organizational culture

To achieve operational excellence in IDP, organizations must embed certain best practices into their culture and daily operations. The following are a few critical areas that can guide organizations in optimizing their IDP workflows:

  • Culture and operating model – Cultivate a culture that champions the strategic design, deployment, and management of IDP workloads. This should be a cultural norm, integrated into the operating model to support agility and responsiveness in document processing.
  • Business and SLA alignment – Align IDP initiatives with business objectives and SLAs. This practice ensures that document processing supports the overall business strategy and meets the performance metrics valued by stakeholders.
  • Continuous AWS training – Commit to regular training and upskilling in AWS services to enhance IDP capabilities. A well-trained team can use AWS’s evolving features for improved document processing efficiency and innovation.
  • Change management – Establish robust change management processes to navigate the IDP landscape’s dynamic nature. Effective change management supports smooth transitions and helps maintain uninterrupted IDP operations during upgrades or shifts in strategy.
  • Defined metrics for IDP success – Establish and monitor clear metrics to measure the success and impact of the IDP operations. For example: With Amazon CloudWatch, you could monitor the number of documents processed through Amazon Textract. Similarly, monitoring the volume and size of documents being uploaded into Amazon Simple Storage Service (Amazon S3) can give insights into the rate at which processing demand is increasing. Furthermore, with AWS Step Functions, you can use the built-in metrics to track the processing job success rate, offering insights into the effectiveness of the workflow orchestration.
  • Iterative improvements – Encourage a culture of feedback and iterative development to refine IDP processes. By regularly analyzing performance data and user feedback, the organization can make informed, incremental improvements to the IDP system.
  • Feedback loop from human review – Integrate a feedback loop from human review into the IDP system. This provides valuable insights that you can use to continuously improve the accuracy and effectiveness of the automated processes.

Workload design

An effective workload design is essential for successful management of intelligent document processing systems. This design must be adaptable to meet diverse demands to handle varying demands, maintaining high quality and accuracy, and achieving seamless integration with other systems. The following are the best practices that can help achieve these goals:

  • Utilizing IDP Workflow stages– When designing an architecture for IDP, it is important to consider the typical stages of an IDP workflow, which may vary based on specific use cases and business needs. Common stages include data capture, document classification, document text extraction, content enrichment, document review and validation , and data consumption. By clearly defining and separating these stages in your architecture, you create a more resilient system. This approach helps in isolating different components in the event of a failure, leading to smoother operations and easier maintenance.
  • Flexible demand handling – Create a document processing system that can easily adapt to changes in demand. This ensures that as business needs shift, the system can scale up or down accordingly and continue to operate smoothly.
    • For example, when interfacing with Amazon Textract, ensure you manage throttling and dropped connections by setting the config parameter when creating the Amazon Textract client. It is recommended to set a retry count of 5, because the AWS SDK retries an operation this specified number of times before considering it a failure. Incorporating this mechanism can handle throttling more effectively by using the SDK’s built-in exponential backoff strategy.
    • AWS might periodically update the service limits based on various factors. Stay updated with the latest documentation and adjust your throttling management strategies accordingly. For example, you can use the Amazon Textract Service Quotas Calculator to estimate the quota values that will satisfy your use case. If your application consistently runs into throttling limits, consider requesting AWS to increase your service quotas for Amazon Textract and Amazon Comprehend.
  • Quality and accuracy optimization – Maximize the precision of data extraction with Amazon Textract by preparing documents in a format conducive to high accuracy, as outlined in the AWS Textract Best Practices. Take advantage of Textract’s Layout feature, which is pre-trained on a diverse array of documents from various industries, including financial services and insurance. This feature simplifies data extraction by reducing the need for complex post-processing code, enhancing efficiency in document processing operations, ultimately enhancing both quality and efficiency in their document processing operations.
  • Seamless external integrations – Ensure that your IDP system can integrate efficiently with external services and systems. This provides a cohesive workflow and allows for broader functionality within the document processing pipeline. For example, review the existing architecture for modularity and identify components that handle external system integrations and break down integration logic into smaller, granular functions using AWS Lambda for flexibility and scalability. Continuously seek feedback from developers and integration partners to refine and optimize the architecture. Employ strategies for decoupled operations, such as event-driven processing, where services like Amazon EventBridge can be utilized for capturing and routing events from external systems.
  • Transparent and adaptable processing – Set up clear, traceable paths for each piece of data from its origin to extraction, which builds trust in the system. Keep documentation of processing rules thorough and up to date, fostering a transparent environment for all stakeholders.
  • Enhance IDP with Amazon Comprehend Flywheel and Amazon Textract Custom Queries
    • Leverage the Amazon Comprehend flywheel for a streamlined ML process, from data ingestion to deployment. By centralizing datasets within the flywheel’s dedicated Amazon S3 data lake, you ensure efficient data management. Regular flywheel iterations guarantee models are trained with the latest data and evaluated for optimal performance. Always promote the highest-performing models to active status, and deploy endpoints synchronized with the active model, reducing manual interventions. This systematic approach, grounded in MLOps principles, drives operational excellence and assures superior model quality.
    • Additionally, with the recent introduction of the Amazon Textract Custom Queries feature, you can refine the extraction process to meet unique business requirements by using natural language questions, thereby improving accuracy for specific document types. Custom Queries simplifies the adaptation of the Amazon Textract Queries feature, eliminating the need for deep ML expertise and facilitating a more intuitive way to extract valuable information from documents.

Build and release optimization

Streamlining the build and release processes is vital for the agility and security of IDP solutions. The following are key practices in build and release optimization, focusing on automation, continuous integration and continuous delivery (CI/CD), and security:

  • Automated deployment – Design your IDP solution using infrastructure-as-code (IaC) principles for consistent and repeatable deployments; the serverless infrastructure can be deployed with AWS Cloud Development Kit (AWS CDK) and orchestrated with low-code visual workflow service like AWS Step Functions.
  • CI/CD pipelines – Leverage tools like AWS CodePipeline, AWS CodeBuild, AWS CodeDeploy for the automation of build, test, and release phases of IDP components and models. Set up automated rollbacks to mitigate deployment risks, and integrate change tracking and governance for thorough validation before production deployment.
  • Security with AWS KMS – Operational excellence isn’t solely about efficiency; security plays an integral role as well. Specifically, for Amazon Comprehend endpoints where customer-managed keys encrypt underlying models, maintaining the integrity using AWS Key Management Service (AWS KMS) key permissions become vital. Utilize AWS Trusted Advisor to check endpoint access risks and manage KMS key permissions.
  • Seamless integration with diverse external systems – Tailor build and release pipelines to emphasize seamless integration with diverse external systems. Use AWS services and best practices to design document processing workflows to easily interface and adapt to various external requirements. This ensures consistency and agility in deployments, prioritizing operational excellence even in complex integration scenarios.


Achieving operational excellence in IDP necessitates an integrated approach where monitoring and observability play pivotal roles. Below are the key practices to ensure clarity, insight, and continuous improvement within an AWS environment:

  • Comprehensive observability – Implement a thorough monitoring and observability solution with tools like Amazon CloudWatch Logs for services such as Amazon Textract and Amazon Comprehend. This approach provides clear operational insights for all stakeholders, fostering efficient operation, responsive event handling, and a cycle of continuous improvement.
  • Amazon Comprehend Endpoint monitoring and auto scaling – Employ Trusted Advisor for diligent monitoring of Amazon Comprehend endpoints to optimize resource utilization. Adjust throughput configurations or use AWS Application Auto Scaling to align resources with demand, enhancing efficiency and cost-effectiveness.
  • Amazon Textract monitoring strategy – For operational excellence in utilizing Amazon Textract, adopt a holistic approach:
    • Utilize CloudWatch to diligently monitor Amazon Textract operations, drawing insights from key metrics like SuccessfulRequestCount, ThrottledCount, ResponseTime, ServerErrorCount, UserErrorCount
    • Set precise alarms based on these metrics, and integrate them with Amazon Simple Notification Service (Amazon SNS) for real-time anomaly detection.
    • Act swiftly on these notifications, ensuring prompt issue rectification and consistent document processing efficiency. This strategy combines meticulous monitoring with proactive intervention, setting the gold standard for operational excellence.
  • Logging API calls with AWS CloudTrail – With AWS CloudTrail , you can gain visibility into API call history and user activity, crucial for operational monitoring and swift incident response. Amazon Textract and Amazon Comprehend services are integrated with AWS CloudTrail.


In this post, we shared design principles, focus areas, foundations and best practices for achieving operational excellence in your IDP solution. By adopting the Well-Architected Framework principles covered in this post, you can optimize your IDP workloads for operational excellence. Focus on key areas like IaC, instrumentation, observability, and continuous improvement, which will help you achieve operational excellence and ensure your IDP systems deliver business value at scale in a secure and compliant manner.

To learn more about the IDP Well-Architected Custom Lens, explore the following posts in this series:

AWS is committed to the IDP Well-Architected Lens as a living tool. As the IDP solutions and related AWS AI services evolve and new AWS services become available, we will update the IDP Lens Well-Architected accordingly.

If you want to learn more about the AWS Well-Architected Framework, refer to AWS Well-Architected.

If you require additional expert guidance, contact your AWS account team to engage an IDP Specialist Solutions Architect.

About the Authors

Brijesh Pati is an Enterprise Solutions Architect at AWS. His primary focus is helping enterprise customers adopt cloud technologies for their workloads. He has a background in application development and enterprise architecture and has worked with customers from various industries such as sports, finance, energy and professional services. His interests include serverless architectures and AI/ML.

Mia Chang is a ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys hiking, board games, and brewing coffee.

Rui Cardoso is a partner solutions architect at Amazon Web Services (AWS). He is focusing on AI/ML and IoT. He works with AWS Partners and support them in developing solutions in AWS. When not working, he enjoys cycling, hiking and learning new things.

Tim Condello is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS). His focus is natural language processing and computer vision. Tim enjoys taking customer ideas and turning them into scalable solutions.

Sherry Ding is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS). She has extensive experience in machine learning with a PhD degree in computer science. She mainly works with public sector customers on various AI/ML related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.

Suyin Wang is an AI/ML Specialist Solutions Architect at AWS. She has an interdisciplinary education background in Machine Learning, Financial Information Service and Economics, along with years of experience in building Data Science and Machine Learning applications that solved real-world business problems. She enjoys helping customers identify the right business questions and building the right AI/ML solutions. In her spare time, she loves singing and cooking.