AWS Architecture Blog

Optimizing your AWS Infrastructure for Sustainability, Part I: Compute

As organizations align their business with sustainable practices, it is important to review every functional area. If you’re building, deploying, and maintaining an IT stack, improving its environmental impact requires informed decision making.

This three-part blog series provides strategies to optimize your AWS architecture within compute, storage, and networking.

In Part I, we provide success criteria and metrics for IT executives and practical guidance for architects/developers on how to adjust the compute layer of your AWS infrastructure for sustainability. To help you achieve this goal, we provide key metrics, show commonly used services and tools, provide insights into analyzing them with Amazon CloudWatch, and provide steps for improving them.

Our commitment to sustainable practices

At AWS, we are committed to running our business in the most environmentally friendly way possible. We also work to enable our customers to use the benefits of the cloud to better monitor and optimize their IT infrastructure. As reported in The Carbon Reduction Opportunity of Moving to Amazon Web Services, our infrastructure is 3.6 times more energy efficient than the median US enterprise data center, and moving to AWS can lower your workload’s carbon footprint by 88% for the same task.

That said, sustainability is a shared responsibility between AWS and our customers. As shown in Figure 1, we optimize for sustainability of the cloud, while customers are responsible for sustainability in the cloud, meaning they must optimize their workloads and resource utilization.

Shared responsibility model for sustainability

Figure 1. Shared responsibility model for sustainability

To reduce the amount of energy spent by your workload, you must use your resources efficiently. “The greenest energy is the energy we don’t use” as Peter DeSantis, VP of AWS Global Infrastructure, has said. But when we have to, using the fewest number of resources possible, and using these to the fullest will result in the lowest impact on the environment. Translating this to your architectural decisions means that the more efficient your architecture is, the more sustainable it will be.

We acknowledge that every AWS architecture is different and there is no one size fits all solution. Therefore, keep in mind that you should assess first if the proposed recommendations adhere to your specific requirements.

Optimizing the compute layer of your AWS infrastructure

Compute services make up the foundation of many customers’ workloads, which brings great potential for optimization. When you look at the energy proportionality of hardware (the ratio between consumed power and utilization), an idle server still consumes power, as shown in The Case for Energy-Proportional Computing. Thus, you can improve the efficiency of your workload by using the fewest number of compute resources and achieving a high utilization. Implementing the recommendations in the following sections will help you use resources more efficiently and save costs.

Reducing idle resources and maximizing utilization

Energy-Proportional Computing: A New Definition reports that compute resources should be maximized, on average 70-80%. This is because energy efficiency quickly decreases as performance drops. The following table shows you the most widely used compute services, the metrics they measure, and their associated user guides.

Service Metric Source
Amazon Elastic Compute Cloud (Amazon EC2) CPUUtilization List of the available CloudWatch metrics for your instances
Total Number of vCPUs Monitor metrics with CloudWatch
Amazon Elastic Container Service (Amazon ECS) CPUUtilization Available metrics and dimensions
Amazon EMR IsIdle Monitor metrics with Cloudwatch

You can monitor these metrics with the architecture shown in Figure 2. CloudWatch provides a unified view of your resource metrics.

CloudWatch monitors your compute resources

Figure 2. CloudWatch monitors your compute resources

With an AWS Cost & Usage Report, you can understand your resource usage. AWS Usage Queries, as shown in Figure 3, is a sample solution that provides an AWS Cloud Development Kit (AWS CDK) template to create, store, query, and visualize your AWS Cost & Usage Report.

AWS Cost & Usage Report for monitoring your compute resources

Figure 3. AWS Cost & Usage Report for monitoring your compute resources

We recommend the following services and tools to reduce idle resources and maximize utilization.

  • Integrate Amazon EC2 Auto Scaling. Setting up Amazon EC2 Auto Scaling allows your workload to automatically scale up and down based on demand.
  • Right-size resources with AWS Cost Explorer and AWS Graviton2. When you decide on an instance type, you should consider the requirements of your workload.
    • Consider using T instances, which come with burstable performance, if your workload has several unpredictable spikes. This reduces the need to over-provision capacity.
    • Use AWS Cost Explorer to see right-sizing recommendations for your workload. It highlights opportunities to reduce cost and improve resource efficiency by downsizing instances where possible.
    • Improve the power efficiency of your compute workload by switching to Graviton2-based instances. Graviton2 is our most power-efficient processor. It delivers 2-3.5 times better CPU performance per watt than any other processor in AWS. Additionally, Graviton2 provides up to 40% better price performance over comparable current generation x86-based instances for various workloads.
  • Adopt a serverless, event-driven architecture. Consider adopting a serverless, event-driven architecture to maximize overall resource utilization. Serverless architecture removes the need for you to run and maintain physical servers, because it is abstracted by AWS services. Because the cost of serverless architectures generally correlates with the level of usage, your workload’s cost efficiency will improve.
    • For asynchronous workloads, use an event-driven architecture so that compute is only used as needed and not in an idle state while waiting.

Shape demand to existing supply

In contrast to matching supply to demand through methods such as automatic scaling, you can shape demand to match existing supply. This strategy makes sense especially for flexible workloads where the exact time to run a certain operation is not relevant. This could include a recurring scheduled task that runs overnight or a workload that can be interrupted.

We recommend the following services and tools to shape demand to existing supply:

  • Use Amazon EC2 Spot Instances. Spot Instances take advantage of unused EC2 capacity in the AWS Cloud. By shaping your demand to the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity.
    • Spot Instances save you up to 90% in cost compared to On-Demand Instances.
    • Use Spot Instances for fault-tolerant, flexible, and stateless workloads that can be interrupted.
  • Apply jitter to scheduled tasks. Avoid load spikes by applying jitter to scheduled tasks and shift time-flexible workloads.
    • Assess if your scheduled tasks can be distributed to run at a random time during the hour.
    • Avoid using 0 as the start minute of scheduled tasks. This is the common default value. Rather, use a number between 2 and 58 to define the start minute for scheduled tasks.

Conclusion

In this blog post, we discussed key metrics and recommended actions you can take to optimize your AWS infrastructure for resource efficiency. This will in turn improve the sustainability of your compute resources.

As your business grows, it is natural that metrics like “Total vCPUs hours” will increase. This is why it is important to track these metrics per unit of work, such as number of vCPUs per 100 users or transactions. This way, the KPIs you measure are independent of your business growth.

In the next part of this blog post series, we show you how to optimize the storage part of your IT infrastructure for sustainability in the cloud!

Find out more

Other blog posts in this series

Related information

Katja Philipp

Katja Philipp

Katja Philipp is an Associate Solutions Architect based in Munich, Germany. With a background in M. Sc. Information Systems, she joined AWS in September 2020 with the TechU Graduate program. She enables her customers in the Power & Utilities vertical with best practices around their cloud journey. Katja is passionate about sustainability and how technology can be leveraged to solve current challenges for a better future.

Aleena Yunus

Aleena Yunus

Aleena Yunus is an Associate Startup Solutions Architect in DACH. She has a Masters in Computer Science majoring in distributed systems and network security. She enjoys developing, helping her customers innovate on the cloud, and apply architectural best practices. Aleena is an avid reader and her favorite author is John Steinbeck.

Otis Antoniou

Otis Antoniou

Otis Antoniou is a Senior Solutions Architect based in London. Otis is focused on supporting Banking, Payments, and Capital Markets customers, helping them innovate and solve business problems using AWS Services. Outside of work, Otis enjoys playing team sports, racket sports, and he is passionate about music.

Ceren Tahtasiz

Ceren Tahtasiz

Ceren Tahtasiz is a Startup Solutions Architect based in London. She helps startups grow by understanding their goals and challenges, and guiding them on how they can get started on AWS. She’s passionate about enabling customers to launch cloud-native products that are resilient and scalable. Her core focus areas are serverless technologies and sustainability in cloud.