AWS Architecture Blog
Optimize AI/ML workloads for sustainability: Part 2, model development
More complexity often means using more energy, and machine learning (ML) models are becoming bigger and more complex. And though ML hardware is getting more efficient, the energy required to train these ML models is increasing sharply.
In this series, we’re following the phases of the Well-Architected machine learning lifecycle (Figure 1) to optimize your artificial intelligence (AI)/ML workloads. In Part 2, we examine the model development phase and show you how to train, tune, and evaluate your ML model to help you reduce your carbon footprint.
If you missed the first part of this series, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing.
Model building
Define acceptable performance criteria
When you build an ML model, you’ll likely need to make trade-offs between your model’s accuracy and its carbon footprint. When we focus only on the model’s accuracy, we “ignore the economic, environmental, or social cost of reaching the reported accuracy.” Because the relationship between model accuracy and complexity is at best logarithmic, training a model longer or looking for better hyperparameters only leads to a small increase in performance.
Establish performance criteria that support your sustainability goals while meeting your business requirements, not exceeding them.
Select energy-efficient algorithms
Begin with a simple algorithm to establish a baseline. Then, test different algorithms with increasing complexity to observe whether performance has improved. If so, compare the performance gain against the difference in resources required.
Try to find simplified versions of algorithms. This will help you use less resources to achieve a similar outcome. For example, DistilBERT, a distilled version of BERT, has 40% fewer parameters, runs 60% faster, and preserves 97% of BERT’s performance.
Use pre-trained or partially pre-trained models
Consider techniques to avoid training a model from scratch:
- Transfer Learning: Use a pre-trained source model and reuse it as the starting point for a second task. For example, a model trained on ImageNet (14 million images) can generalize with other datasets.
- Incremental Training: Use artifacts from an existing model on an expanded dataset to train a new model.
Optimize your deep learning models to accelerate training
Compile your DL models from their high-level language representation to hardware-optimized instructions to reduce training time. You can achieve this with open-source compilers or Amazon SageMaker Training Compiler, which can speed up training of DL models by up to 50% by more efficiently using SageMaker GPU instances.
Start with small experiments, datasets, and compute resources
Experiment with smaller datasets in your development notebook. This allows you to iterate quickly with limited carbon emission.
Automate the ML environment
When building your model, use Lifecycle Configuration Scripts to automatically stop idle SageMaker Notebook instances. If you are using SageMaker Studio, install the auto-shutdown Jupyter extension to detect and stop idle resources.
Use the fully managed training process provided by SageMaker to automatically launch training instances and shut them down as soon as the training job is complete. This minimizes idle compute resources and thus limits the environmental impact of your training job.
Adopt a serverless architecture for your MLOps pipelines. For example, orchestration tools like AWS Step Functions or SageMaker Pipelines only provision resources when work needs to be done. This way, you’re not maintaining compute infrastructure 24/7.
Model training
Select sustainable AWS Regions
As mentioned in Part 1, select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to train your model.
Use a debugger
A debugger like SageMaker Debugger can identify training problems like system bottlenecks, overfitting, saturated activation functions, and under-utilization of system resources. It also provides built-in rules like LowGPUUtilization
or Overfit
. These rules monitor your workload and will automatically stop a training job as soon as it detects a bug (Figure 2), which helps you avoid unnecessary carbon emissions.
Optimize the resources of your training environment
Reference the recommended instance types for the algorithm you’ve selected in the SageMaker documentation. For example, for DeepAR, you should start with a single CPU instance and only switch to GPU and multiple instances when necessary.
Right size your training jobs with Amazon CloudWatch metrics that monitor the utilization of resources like CPU, GPU, memory, and disk utilization.
Consider Managed Spot Training, which takes advantage of unused Amazon Elastic Compute Cloud (Amazon EC2) capacity and can save you up to 90% in cost compared to On-Demand instances. By shaping your demand for the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity of the overall AWS Cloud.
Use efficient silicon
Use AWS Trainium for optimized for DL training workloads. It is expected to be our most energy efficient processor for this purpose.
Archive or delete unnecessary training artifacts
Organize your ML experiments with SageMaker Experiments to clean up training resources you no longer need.
Reduce the volume of logs you keep. By default, CloudWatch retains logs indefinitely. By setting limited retention time for your notebooks and training logs, you’ll avoid the carbon footprint of unnecessary log storage.
Model tuning and evaluation
Use efficient cross-validation techniques for hyperparameter optimization
Prefer Bayesian search over random search (and avoid grid search). Bayesian search makes intelligent guesses about the next set of parameters to pick based on the prior set of trials. It typically requires 10 times fewer jobs than random search, and thus 10 times less compute resources, to find the best hyperparameters.
Limit the maximum number of concurrent training jobs. Running hyperparameter tuning jobs concurrently gets more work done quickly. However, a tuning job improves only through successive rounds of experiments. Typically, running one training job at a time achieves the best results with the least amount of compute resources.
Carefully choose the number of hyperparameters and their ranges. You get better results and use less compute resources by limiting your search to a few parameters and small ranges of values. If you know that a hyperparameter is log-scaled, convert it to further improve the optimization.
Use warm-start hyperparameter tuning
Use warm-start to leverage the learning gathered in previous tuning jobs to inform which combinations of hyperparameters to search over in the new tuning job. This technique avoids restarting hyperparameter optimization jobs from scratch and thus reduces the compute resources needed.
Measure results and improve
To monitor and quantify improvements of your training jobs, track the following metrics:
- Resources provisioned for your training jobs (
InstanceCount
,InstanceType
, andVolumeSizeInGB
) - Efficient use of these resources (
CPUUtilization
,GPUUtilization
,GPUMemoryUtilization
,MemoryUtilization
, andDiskUtilization
) in the SageMaker Console, the CloudWatch Console or your SageMaker Debugger Profiling Report
For storage:
- The total size of your Amazon Simple Storage Service (Amazon S3) buckets and storage class distribution, using Amazon S3 Storage Lens
- The size of your CloudWatch log groups
Conclusion
In this blog post, we discussed techniques and best practices to reduce the energy required to build, train, and evaluate your ML models.
We also provided recommendations for the tuning process as it makes up a large part of the carbon impact of building an ML model. During hyperparameter and neural design search, hundreds of versions of a given model are created, trained, and evaluated before identifying an optimal design.
In the next post, we’ll continue our sustainability journey through the ML lifecycle and discuss the best practices you can follow when deploying and monitoring your model in production.
Want to learn more? Check out the Architecting for sustainability session at re:Invent 2021, and other blog posts on architecting for sustainability.
These practices are part of the Sustainability Pillar of the AWS Well-Architected Framework, which helps you build secure, high-performing, resilient, and efficient infrastructure for your applications and workloads. Use the AWS Well-Architected Tool to address important design considerations and ensure that your workloads follow the best practices and guidance of the Well-Architected Framework. For follow-up questions or comments, join our growing community on AWS re:Post.
Other posts in this series
- Optimize AI/ML workloads for sustainability: Part 1, identify business goals, validate ML use, and process data
- Optimize AI/ML workloads for sustainability: Part 3, deployment and monitoring
Looking for more architecture content?
AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, AWS Well-Architected best practices, patterns, icons, and more!