AWS Architecture Blog

Let’s Architect! Architecting for Machine Learning

Though it seems like something out of a sci-fi movie, machine learning (ML) is part of our day-to-day lives. So often, in fact, that we may not always notice it. For example, social networks and mobile applications use ML to assess user patterns and interactions to deliver a more personalized experience.

However, AWS services provide many options for the integration of ML. In this post, we will show you some use cases that can enhance your platforms and integrate ML into your production systems.

Dynamic A/B testing for machine learning models with Amazon SageMaker MLOps projects

Performing A/B testing on production traffic to compare a new ML model with the old model is a recommended step after offline evaluation.

This blog post explains how A/B testing works and how it can be combined with multi-armed bandit testing to gradually send traffic to the more effective variants during the experiment. It will teach you how to build it with AWS Cloud Development Kit (AWS CDK), architect your system for MLOps, and automate the deployment of the solutions for A/B testing.

This diagram shows the iterative process to analyze the performance of ML models in online and offline scenarios.

This diagram shows the iterative process to analyze the performance of ML models in online and offline scenarios

Enhance your machine learning development by using a modular architecture with Amazon SageMaker projects

Modularity is a key characteristic for modern applications. You can modularize code, infrastructure, and even architecture.

A modular architecture provides an architecture and framework that allows each development role to work on their own part of the system, and hide the complexity of integration, security, and environment configuration. This blog post provides an approach to building a modular ML workload that is easy to evolve and maintain across multiple teams.

A modular architecture allows you to easily assemble different parts of the system and replace them when needed

A modular architecture allows you to easily assemble different parts of the system and replace them when needed

Automate model retraining with Amazon SageMaker Pipelines when drift is detected

The accuracy of ML models can deteriorate over time because of model drift or concept drift. This is a common challenge when deploying your models to production. Have you ever experienced it? How would you architect a solution to address this challenge?

Without metrics and automated actions, maintaining ML models in production can be overwhelming. This blog post shows you how to design an MLOps pipeline for model monitoring to detect concept drift. You can then expand the solution to automatically launch a new training job after the drift was detected to learn from the new samples, update the model, and take into account the changes in the data distribution.

Concept drift happens when there is a shift in the distribution. In this case, the distribution of the newly collected data (in blue) starts differing from the baseline distribution (in green)

Concept drift happens when there is a shift in the distribution. In this case, the distribution of the newly collected data (in blue) starts differing from the baseline distribution (in green)

Architect and build the full machine learning lifecycle with AWS: An end-to-end Amazon SageMaker demo

Moving from experimentation to production forces teams to move fast and automate their operations. Adopting scalable solutions for MLOps is a fundamental step to successfully create production-oriented ML processes.

This blog post provides an extended walkthrough of the ML lifecycle and explains how to optimize the process using Amazon SageMaker. Starting from data ingestion and exploration, you will see how to train your models and deploy them for inference. Then, you’ll make your operations consistent and scalable by architecting automated pipelines. This post offers a fraud detection use case so you can see how all of this can be used to put ML in production.

The ML lifecycle involves three macro steps: data preparation, train and tuning, and deployment with continuous monitoring.

The ML lifecycle involves three macro steps: data preparation, train and tuning, and deployment with continuous monitoring

See you next time!

Thanks for reading! We’ll see you in a couple of weeks when we discuss how to secure your workloads in AWS.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other posts in this series

Luca Mezzalira

Luca Mezzalira

Luca is Principal Solutions Architect based in London. He has authored several books and is an international speaker. He lent his expertise predominantly in the solution architecture field. Luca has gained accolades for revolutionizing the scalability of front-end architectures with micro-frontends, from increasing the efficiency of workflows, to delivering quality in products.

Laura Hyatt

Laura Hyatt

Laura Hyatt is a Solutions Architect for AWS Public Sector and helps Education customers in the UK. Laura helps customers not only architect and develop scalable solutions but also think big on innovative solutions facing the education sector at present. Laura's specialty is IoT, and she is also the Alexa SME for Education across EMEA.

Vittorio Denti

Vittorio Denti

Vittorio Denti is a Machine Learning Engineer at Amazon based in London. After completing his M.Sc. in Computer Science and Engineering at Politecnico di Milano (Milan) and the KTH Royal Institute of Technology (Stockholm), he joined AWS. Vittorio has a background in distributed systems and machine learning. He's especially passionate about software engineering and the latest innovations in machine learning science.

Zamira Jaupaj

Zamira Jaupaj

Zamira is an Enterprise Solutions Architect based in the Netherlands. She is highly passionate IT professional with over 10 years of multi-national experience in designing and implementing critical and complex solutions with containers, serverless, and data analytics for small and enterprise companies.