Guidance for Optimizing Deep Learning Workloads for Sustainability on AWS

Overview

This Guidance applies principles and best practices from the Sustainability Pillar of the AWS Well-Architected Framework to reduce the carbon footprint of your deep learning workloads. From data processing to model building, training and inference, this Guidance demonstrates how to maximize utilization and minimize the total resources needed to support your workloads.

How it works

Data processing

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

Architecture diagram showing AWS IoT Greengrass, AWS IoT Core, and Amazon SageMaker integration for edge device inference and machine learning. The workflow illustrates data preparation with AWS Glue and Step Functions, data flow into SageMaker Feature Store, SageMaker endpoints (including serverless, batch, and auto scaling groups), and integration with AWS Inferentia and AWS Trainium hardware accelerators for model deployment and inference.

Model building

Download the architecture diagram

A detailed architecture diagram showing an end-to-end machine learning workflow on AWS. The flow includes raw data ingestion from Amazon S3, data preparation with AWS Glue and Step Functions, training and inference with Amazon SageMaker, integration with AWS IoT Greengrass, model deployment with auto-scaling groups, and monitoring tools such as CloudWatch and SageMaker Debugger. Edge device inference and asynchronous endpoints are also illustrated.

Model training

Download the architecture diagram

Inference

Download the architecture diagram

A detailed architecture diagram illustrating the workflow of AWS SageMaker for machine learning, including data preparation using AWS Glue and Step Functions, feature storage, model training, inference (with endpoints and batch transform), integration with AWS IoT Greengrass, Amazon S3, and supporting AWS services like Inferentia, CloudWatch, and more. The diagram provides an end-to-end view of the ML model lifecycle from raw data ingestion to edge device inference and monitoring.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as amany Well-Architected best practices as possible.

CloudWatch is used to measure machine learning (ML) operations metrics to monitor the performance of the deployed environment. In the data processing phase, AWS Glue and Step Functions workflows are used to track the history of the data within the pipeline execution. In the model development phase, SageMaker Debugger provides near real-time monitoring of training jobs to detect issues and performance bottlenecks. In the deployment phase, the health of model endpoints deployed on SageMaker hosting options is monitored using CloudWatch metrics and alarms.

Read the Operational Excellence whitepaper

All the proposed services support integration with AWS Identity and Access Management (IAM) that can be used to control access to resources and data. Data is stored in Amazon S3 and SageMaker Feature Store that support encryption at rest using AWS Key Management Service (AWS KMS). To reduce data exposure risks, data lifecycle plans are established to remove data automatically based on age, and store only the data that has business need.

Read the Security whitepaper

The customer has the option to deploy SageMaker services in a highly available manner. AWS Glue Data Catalog is used to track the data assets that have been loaded into the ML workloads. Fault tolerant, repeatable, and highly available data processing is ensured thanks to data pipelines.

Read the Reliability whitepaper

Training and inference instance types are optimized using CloudWatch metrics and SageMaker Inference Recommender. The use of simplified versions of algorithms, pruning, and quantization is recommended to achieve better performance. SageMaker Training Compiler can speed up training of deep learning models by up to 50%, and SageMaker Neo optimizes ML models to perform up to 25x faster. Instances based on Trainium and Inferentia offer higher performance compared to other Amazon EC2 instances.

Read the Performance Efficiency whitepaper

We encourage the use of existing publicly available datasets to avoid the cost of storing and processing data. Using the appropriate Amazon S3 storage tier, S3 Lifecycle policies, and S3 Intelligent-Tiering storage class help reduce storage cost.

SageMaker Feature Store helps reduce the cost of storing and processing duplicated datasets. We recommend data and compute proximity to reduce transfer costs. Serverless data pipelines, asynchronous SageMaker endpoints, and SageMaker batch transform help avoid the cost of maintaining compute infrastructure 24/7. We encourage optimization techniques (compilation, pruning, quantization, use of simplified version of algorithms) as well as transfer learning and incremental training to reduce training and inference costs. Scripts are provided to automatically shutdown unused resources.

Read the Cost Optimization whitepaper

This reference architecture aligns with the goals of optimization for sustainability:

Ensure elimination of idle resources by the use of serverless technologies (AWS Glue, Step Functions, SageMaker Serverless Inference Endpoint) and environment automation
Achieve reduction of unnecessary data processing and data storage using Amazon S3 lifecycle policies, SageMaker Feature Store, and the use of existing, publicly available datasets and models
Achieve maximization of the utilisation of provisioned resources by right-sizing the environments (using CloudWatch and SageMaker Inference Recommender) and asynchronous processing (SageMaker Asynchronous Endpoints)
Achieve maximization of CPU efficiency using simplified versions of algorithms, models compilation (SageMaker Training compiler and SageMaker Neo), and compression techniques (pruning and quantization)

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Optimizing Deep Learning Workloads for Sustainability on AWS

Overview

How it works

Data processing

Model building

Model training

Inference

Well-Architected Pillars

Related Content

Optimize AI/ML workloads for sustainability: Part 1

Optimize AI/ML workloads for sustainability: Part 2

Optimize AI/ML workloads for sustainability: Part 3

Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help