Optimizing your AWS Infrastructure for Sustainability, Part I: Compute
As organizations align their business with sustainable practices, it is important to review every functional area. If you’re building, deploying, and maintaining an IT stack, improving its environmental impact requires informed decision making.
This three-part blog series provides strategies to optimize your AWS architecture within compute, storage, and networking.
In Part I, we provide success criteria and metrics for IT executives and practical guidance for architects/developers on how to adjust the compute layer of your AWS infrastructure for sustainability. To help you achieve this goal, we provide key metrics, show commonly used services and tools, provide insights into analyzing them with Amazon CloudWatch, and provide steps for improving them.
Our commitment to sustainable practices
At AWS, we are committed to running our business in the most environmentally friendly way possible. We also work to enable our customers to use the benefits of the cloud to better monitor and optimize their IT infrastructure. As reported in The Carbon Reduction Opportunity of Moving to Amazon Web Services, our infrastructure is 3.6 times more energy efficient than the median US enterprise data center, and moving to AWS can lower your workload’s carbon footprint by 88% for the same task.
That said, sustainability is a shared responsibility between AWS and our customers. As shown in Figure 1, we optimize for sustainability of the cloud, while customers are responsible for sustainability in the cloud, meaning they must optimize their workloads and resource utilization.
To reduce the amount of energy spent by your workload, you must use your resources efficiently. “The greenest energy is the energy we don’t use” as Peter DeSantis, VP of AWS Global Infrastructure, has said. But when we have to, using the fewest number of resources possible, and using these to the fullest will result in the lowest impact on the environment. Translating this to your architectural decisions means that the more efficient your architecture is, the more sustainable it will be.
We acknowledge that every AWS architecture is different and there is no one size fits all solution. Therefore, keep in mind that you should assess first if the proposed recommendations adhere to your specific requirements.
Optimizing the compute layer of your AWS infrastructure
Compute services make up the foundation of many customers’ workloads, which brings great potential for optimization. When you look at the energy proportionality of hardware (the ratio between consumed power and utilization), an idle server still consumes power, as shown in The Case for Energy-Proportional Computing. Thus, you can improve the efficiency of your workload by using the fewest number of compute resources and achieving a high utilization. Implementing the recommendations in the following sections will help you use resources more efficiently and save costs.
Reducing idle resources and maximizing utilization
Energy-Proportional Computing: A New Definition reports that compute resources should be maximized, on average 70-80%. This is because energy efficiency quickly decreases as performance drops. The following table shows you the most widely used compute services, the metrics they measure, and their associated user guides.
|Amazon Elastic Compute Cloud (Amazon EC2)||CPUUtilization||List of the available CloudWatch metrics for your instances|
|Total Number of vCPUs||Monitor metrics with CloudWatch|
|Amazon Elastic Container Service (Amazon ECS)||CPUUtilization||Available metrics and dimensions|
|Amazon EMR||IsIdle||Monitor metrics with Cloudwatch|
You can monitor these metrics with the architecture shown in Figure 2. CloudWatch provides a unified view of your resource metrics.
With an AWS Cost & Usage Report, you can understand your resource usage. AWS Usage Queries, as shown in Figure 3, is a sample solution that provides an AWS Cloud Development Kit (AWS CDK) template to create, store, query, and visualize your AWS Cost & Usage Report.
We recommend the following services and tools to reduce idle resources and maximize utilization.
- Integrate Amazon EC2 Auto Scaling. Setting up Amazon EC2 Auto Scaling allows your workload to automatically scale up and down based on demand.
- Set up scheduled or dynamic scaling policies based on metrics such as average CPU utilization or average network in or out.
- Integrate AWS Instance Scheduler and Scheduled scaling for Amazon EC2 Auto Scaling to schedule shut downs and terminate resources that run only during business hours or on weekdays.
- Right-size resources with AWS Cost Explorer and AWS Graviton2. When you decide on an instance type, you should consider the requirements of your workload.
- Consider using T instances, which come with burstable performance, if your workload has several unpredictable spikes. This reduces the need to over-provision capacity.
- Use AWS Cost Explorer to see right-sizing recommendations for your workload. It highlights opportunities to reduce cost and improve resource efficiency by downsizing instances where possible.
- Improve the power efficiency of your compute workload by switching to Graviton2-based instances. Graviton2 is our most power-efficient processor. It delivers 2-3.5 times better CPU performance per watt than any other processor in AWS. Additionally, Graviton2 provides up to 40% better price performance over comparable current generation x86-based instances for various workloads.
- Adopt a serverless, event-driven architecture. Consider adopting a serverless, event-driven architecture to maximize overall resource utilization. Serverless architecture removes the need for you to run and maintain physical servers, because it is abstracted by AWS services. Because the cost of serverless architectures generally correlates with the level of usage, your workload’s cost efficiency will improve.
- For asynchronous workloads, use an event-driven architecture so that compute is only used as needed and not in an idle state while waiting.
Shape demand to existing supply
In contrast to matching supply to demand through methods such as automatic scaling, you can shape demand to match existing supply. This strategy makes sense especially for flexible workloads where the exact time to run a certain operation is not relevant. This could include a recurring scheduled task that runs overnight or a workload that can be interrupted.
We recommend the following services and tools to shape demand to existing supply:
- Use Amazon EC2 Spot Instances. Spot Instances take advantage of unused EC2 capacity in the AWS Cloud. By shaping your demand to the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity.
- Spot Instances save you up to 90% in cost compared to On-Demand Instances.
- Use Spot Instances for fault-tolerant, flexible, and stateless workloads that can be interrupted.
- Apply jitter to scheduled tasks. Avoid load spikes by applying jitter to scheduled tasks and shift time-flexible workloads.
- Assess if your scheduled tasks can be distributed to run at a random time during the hour.
- Avoid using 0 as the start minute of scheduled tasks. This is the common default value. Rather, use a number between 2 and 58 to define the start minute for scheduled tasks.
In this blog post, we discussed key metrics and recommended actions you can take to optimize your AWS infrastructure for resource efficiency. This will in turn improve the sustainability of your compute resources.
As your business grows, it is natural that metrics like “Total vCPUs hours” will increase. This is why it is important to track these metrics per unit of work, such as number of vCPUs per 100 users or transactions. This way, the KPIs you measure are independent of your business growth.
In the next part of this blog post series, we show you how to optimize the storage part of your IT infrastructure for sustainability in the cloud!
Other blog posts in this series
- Optimizing your AWS Infrastructure for Sustainability, Part II: Storage
- Optimizing your AWS Infrastructure for Sustainability, Part III: Networking