AWS Architecture Blog

Announcing the updated AWS Well-Architected Machine Learning Lens

We are excited to announce the updated AWS Well-Architected Machine Learning Lens, now enhanced with the latest capabilities and best practices for building machine learning (ML) workloads on AWS.

The AWS Well-Architected Framework provides architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable workloads in the cloud. The Machine Learning Lens uses the Well-Architected Framework to outline the steps for performing a comprehensive review of your ML architectures.

The updated Machine Learning Lens provides a consistent approach for customers to evaluate architectures across ML workloads, from traditional supervised and unsupervised learning to modern AI applications. This lens addresses common considerations relevant to the complete ML lifecycle, including business goal identification, problem framing, data processing, model development, deployment, and monitoring. The lens incorporates the latest AWS ML services and capabilities introduced since 2023, providing access to current best practices and implementation guidance.

The Machine Learning Lens is part of a collection of Well-Architected lenses published under AWS Well-Architected Lenses.

What is the Machine Learning Lens?

The Well-Architected Machine Learning Lens focuses on the six pillars of the Well-Architected Framework across six phases of the ML lifecycle.

The six phases are:

  1. Business goal identification: Establishing clear business objectives and success criteria for your ML initiative.
  2. ML problem framing: Translating business problems into well-defined ML problems with appropriate metrics.
  3. Data processing: Collecting, preparing, and engineering features from your data sources.
  4. Model development: Building, training, tuning, and evaluating ML models with proper experimentation tracking.
  5. Model deployment: Deploying models into production environments with appropriate infrastructure and monitoring.
  6. Model monitoring: Continuously monitoring model performance and maintaining model quality over time.

Unlike the traditional waterfall approach, an iterative approach is required to achieve a working prototype based on the six phases of the ML lifecycle. The lens provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Framework pillars for each ML lifecycle phase.

You can also use the Well-Architected Machine Learning Lens wherever you are on your cloud journey. You can choose to apply this guidance either during the design of your ML workloads or after your workloads have entered production as a part of the continuous improvement process.

Machine Learning Lens components

The lens includes four focus areas:

  1. Well-Architected ML design principles: Ten design principles that frame the presented best practices, including assign ownership, enable reproducibility, optimize resources, and enable continuous improvement.
  2. The ML lifecycle and the Well-Architected Framework pillars: This considers all aspects of the ML lifecycle and reviews design strategies aligned to the pillars of the overall Well-Architected Framework:
    • Operational excellence: Ability to support ongoing development, run ML workloads effectively, gain insight into operations, and continuously improve processes.
    • Security: Ability to protect data, models, and ML infrastructure while taking advantage of cloud technologies to improve security posture.
    • Reliability: Ability of ML workloads to perform their intended function correctly and consistently, with automatic recovery from failure situations.
    • Performance efficiency: Ability to use computing resources efficiently for ML workloads and maintain efficiency as demand and technologies evolve.
    • Cost optimization: Ability to run ML systems to deliver business value at the lowest price point through resource optimization and automation.
    • Sustainability: Addresses the environmental impact of ML workloads, focusing on energy consumption and resource efficiency.
  3. Cloud-agnostic best practices: 100+ comprehensive best practices covering each ML lifecycle phase across the Well-Architected Framework pillars. Each best practice includes:
    • Implementation guidance: Detailed AWS implementation plans with references to current AWS ML services and capabilities.
    • Resources: Curated links to AWS documentation, blogs, videos, and code examples supporting the best practices.
  4. Related ML architecture considerations: Discussions on advanced topics including MLOps patterns, data architecture for ML, model governance strategies, and considerations for responsible AI implementation.

What else is discussed in the Machine Learning Lens?

The Machine Learning Lens also discusses the following key topics:

  • Responsible AI: Comprehensive guidance on implementing fair, explainable, and unbiased ML systems throughout the development lifecycle.
  • MLOps and automation: Best practices for implementing continuous integration, continuous deployment, and continuous training for ML workloads.
  • Data architecture for ML: Guidance on building robust data pipelines, feature stores, and data governance practices that support ML workloads at scale.
  • Model governance and lineage: Strategies for tracking model versions, maintaining audit trails, and ensuring compliance with regulatory requirements.

What’s new in the updated Machine Learning Lens?

The updated Machine Learning Lens incorporates the latest AWS ML capabilities and best practices introduced since 2023, including:

  • Enhanced data and AI collaborative workflows: Integrated development through Amazon SageMaker Unified Studio – MLOPS02-BP01, MLOPS01-BP01, MLOPS03-BP01, and MLOPS02-BP04.
  • AI-assisted development lifecycle: Code generation and productivity enhancement using Kiro and Amazon Q Developer – MLCOST01-BP02, MLOPS01-BP01, MLCOST03-BP02, and MLSUS05-BP02.
  • Distributed training infrastructure: Large-scale foundation model development and fine-tuning with Amazon SageMaker HyperPod – MLCOST04-BP02, MLCOST04-BP07, MLPERF06-BP05, MLSEC03-BP02, MLCOST04-BP06, MLPERF06-BP07, and MLSUS05-BP02.
  • Model customization capabilities: Knowledge distillation and fine-tuning for domain-specific applications using Amazon Bedrock with Kiro and Amazon Q Developer integration and model hub with Amazon SageMaker Jumpstart – MLCOST01-BP02, MLCOST01-BP01, MLCOST03-BP02, MLSUS04-BP02, MLCOST05-BP01, and MLSUS05-BP02.
  • No-code ML development: Natural language support for building models using SageMaker Canvas with Amazon Q Developer integration – MLCOST03-BP02, MLCOST03-BP03, MLOPS01-BP01, and MLSUS05-BP02.
  • Improved bias detection: Enhanced fairness metrics in SageMaker Clarify with Model Monitor for drift detection – MLREL02-BP01, MLREL03-BP04, MLREL02-BP04, MLREL02-BP05, and MLREL02-BP02.
  • Modular inference architecture: Flexible deployment with SageMaker Inference Components and Multi-Model Endpoints – MLCOST05-BP01, MLREL01-BP01, MLSUS05-BP01, MLCOST05-BP03, and MLREL01-BP02.
  • Advanced observability: Improved debugging with SageMaker Debugger, Model Monitor, and CloudWatch across the ML lifecycle – MLOPS06-BP02, MLOPS05-BP02, MLOPS06-BP01, and MLOPS02-BP04.
  • Enhanced cost optimization: Resource management through SageMaker Training Plans, Savings Plans, and Spot Instance support – MLCOST05-BP03, MLOPS05-BP02, MLCOST06-BP01, MLCOST06-BP02, and MLCOST04-BP06.

Who should use the Machine Learning Lens?

The Machine Learning Lens is valuable for many roles across your organization. Business leaders can use this lens to understand the end-to-end implementation and business value of ML initiatives. Data scientists and ML engineers can leverage the lens to understand how to build, deploy, and maintain ML systems at scale. DevOps and platform engineers can learn how to create reliable, secure infrastructure for ML workloads. Risk and compliance leaders can understand how ML systems are implemented responsibly while adhering to regulatory and governance requirements.

Next steps

If you require support on the implementation or assessment of your ML workloads, please contact your AWS Solutions Architect or Account Representative.

Special thanks to everyone across the AWS Solution Architecture, AWS Professional Services, and Machine Learning communities who contributed to the updated Machine Learning Lens. These contributions encompassed diverse perspectives, expertise, backgrounds, and experiences in developing comprehensive guidance for ML workloads on AWS.

For additional reading, refer to the AWS Well-Architected Framework, or explore the AWS Well-Architected Generative AI Lens for guidance specific to generative AI workloads.


About the authors

Steven DeVries

Steven DeVries

Steven is a Principal Solutions Architect at AWS leading Data and AI initiatives for Automotive and Manufacturing customers. He deploys agentic workflows, builds ML pipelines, and architects generative AI applications that turn emerging technologies into business value.

Gopi Krishnamurthy

Gopi Krishnamurthy

Gopi Krishnamurthy is a Senior Solutions Architect at AWS, based in New York, USA. As a machine learning specialist, Gopi is passionate about deep learning and serverless technologies.

Haleh Najafzadeh

Haleh Najafzadeh

is a Principal Solutions Architect at AWS with over 25 years of experience in applying scientific techniques to challenging industrial problems while sharing technology best practices enabling customers with architecting and implementing solutions at scale.