AWS Architecture Blog

Introducing the Well-Architected Framework for Machine Learning

We have published a new whitepaper, Machine Learning Lens, to help you design your machine learning (ML) workloads following cloud best practices. This whitepaper gives you an overview of the iterative phases of ML and introduces you to the ML and artificial intelligence (AI) services available on AWS using scenarios and reference architectures.

How often are you asking yourself “Am I doing this right?” when building and running applications in the cloud? You are not alone. To help you answer this question, we released the AWS Well-Architected Framework in 2015. The Framework is a formal approach for comparing your workload against AWS best practices and getting guidance on how to improve it.

We added “lenses” in 2017 that extended the Framework beyond its general perspective to provide guidance for specific technology domains, such as the Serverless Applications Lens, High Performance Computing (HPC) Lens, and IoT (Internet of Things) Lens. The new Machine Learning Lens further extends that guidance to include best practices specific to machine learning workloads.

A typical question we often hear is: “Should I use both the Lens whitepaper and the AWS Well-Architected Framework whitepaper when reviewing a workload?” Yes! We purposely exclude topics covered by the Framework in the Lens whitepapers. To fully evaluate your workload, use the Framework along with any applicable Lens whitepapers.

Applying the Machine Learning Lens

In the Machine Learning Lens, we focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. The whitepaper starts by describing the general design principles for ML workloads. We then discuss the design principles for each of the five pillars of the Framework—operational excellence, security, reliability, performance efficiency, and cost optimization—as they relate to ML workloads.

Although the Lens is written to provide guidance across all pillars, each pillar is designed to be consumable by itself. This design results in some intended redundancy across the pillars. The following figure shows the structure of the whitepaper.

ML Lens-Well Architected (1)

The primary components of the AWS Well-Architected Framework are Pillars, Design Principles, Questions, and Best Practices. These components are illustrated in the following figure and are outlined in the AWS Well-Architected Framework whitepaper.

WA-Failure Management (2)

The Machine Learning Lens follows this pattern, with Design Principles, Questions, and Best Practices tailored for machine learning workloads. Each pillar has a set of questions, mapped to the design principles, which drives best practices for ML workloads.

ML Lens-Well Architected (2)

To review your ML workloads, start by answering the questions in each pillar. Identify opportunities for improvement as well as critical items for remediation. Then make a prioritized plan to address the improvement opportunities and remediation items.

We recommend that you regularly evaluate your workloads. Use the Lens throughout the lifecycle of your ML workload—not just at the end when you are about to go into production. When used during the design and implementation phase, the Machine Learning Lens can help you identify potential issues early, so that you have more time to address them.

Available now

The Machine Learning Lens is available now. Start using it today to review your existing workloads or to guide you in building new ones. Use the Lens to ensure that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, and cost optimization in mind.

Shelbee Eigenbrode

Shelbee Eigenbrode

Shelbee Eigenbrode is a Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She’s been in technology for 23 years, spanning multiple industries and technologies as well as multiple roles. She is currently focused on combining her background in DevOps and machine learning with the delivery and management of machine learning workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is based in Denver, CO. In her spare time, she likes to spend time with her family, friends, and overactive dog.

Bardia Nikpourian

Bardia Nikpourian

Bardia Nikpourian is a senior specialist cloud technical account manager at Amazon where he leads the company’s global AI/ML services for Enterprise Support. He graduated from Arizona State University in 2012, where he now instructs a Master’s-level machine learning course CSE 575. In his free time he enjoys theater and the arts, and he serves on the Board of Directors for the Herberger Theater Center in Arizona.

Sireesha Muppala

Sireesha Muppala

Sireesha Muppala is an AI/ML Specialist Solutions Architect at AWS, providing guidance to customers on architecting and implementing machine learning solutions at scale. She received her Ph.D. in Computer Science from University of Colorado, Colorado Springs. In her spare time, Sireesha loves to run and hike Colorado trails.

Christian Williams

Christian Williams

Christian Williams is a Machine Learning Specialist Solutions Architect at AWS, where he focuses on building cloud-based machine learning solutions for global enterprise customers. He received his Master of Science degree in Systems Engineering from the Frank R. Seaver College of Science and Engineering at Loyola Marymount University. He is a member of the Computer Science Advisory Board at Santa Monica College, and in his spare time, he enjoys composing and playing music.