Measure and track cloud efficiency with sustainability proxy metrics, Part I: What are proxy metrics?

Sustainability has become an important decision-making factor for customers, employees, regulators, investors, and partners. Customers have started their journey towards a sustainable business and operations. If you’re building, deploying, and maintaining IT infrastructure, reducing its environmental impact is an important step in reaching company-wide sustainability goals. Thus, sustainability has become a non-functional requirement in modern software and systems architecture, along with security, maintainability, reliability and others.

When it comes to architecting workloads in the cloud, sustainability is a shared responsibility between AWS and customers. Whereby AWS optimizes for sustainability of the cloud, customers are responsible for sustainability in the cloud. Customers optimize their service usage and resource efficiency.

This blog post series provides an overview on how you can establish a sustainability proxy metrics showback mechanism for teams wanting to optimize their AWS usage for sustainability. In Part I, we introduce the concept of proxy metrics and the importance of normalization. We also show examples of how customers have used this concept to reduce the environmental impact of their applications. In Part II: Establish a metrics pipeline, we discuss how you set up a proxy metrics data pipeline to establish a sustainability proxy metrics showback mechanism.

Optimize your workloads with proxy metrics

Every optimization should start with a goal informed with metrics or KPIs: Reducing cost, increasing performance, or reducing greenhouse gas emissions. The AWS Customer Carbon Footprint Tool (CCFT) provides the important output metric of the greenhouse gas emissions associated with a customer’s AWS service usage. This emission data is used for reporting and understanding the high-level impact on a monthly basis. However, while AWS is working to increase scope and granularity of the CCFT (read this blog), a practice of continuous optimization cycles calls for fine-grained metrics. Absolute emissions don’t expose the efficiency of a workload. Emissions are the outcome of multiple factors including factors which are not in the responsibility of application teams such as the usage of an application, or the carbon intensity of energy.

For these purposes we complement the carbon emissions reported by the AWS Customer Carbon Footprint Tool with dependent metrics that we call sustainability proxy metrics. We have also launched the Sustainability Proxy Metrics Dashboard (you can access the dashboard from this link), as part of the Cloud Intelligence Dashboards.

Good sustainability proxy metrics serve as fine-grained substitutes to carbon emissions, that provide insights into workload efficiency. Metrics that we track in near real-time and break down to application teams and resources, so they are suitable for fast optimization cycle times. They are tangible metrics that reflect resource usage, in terms of Compute, Storage and Networking (read these blogs).

Flow diagram showing business needs are supported by operational processes which make use of AWS services. AWS service usage implies cost as well as emissions associated with energy and physical resource usage.

Figure 1. AWS emissions overview

As depicted in Figure 1 on the right, calculating Greenhouse Gas Emissions for AWS service usage is dependent on multiple data sources. This includes the energy required to run cloud resources (Scope 1 & 2) and the indirect emissions associated with the lifecycle of physical resources, up- and downstream in the value chain (Scope 3). Similarly, cost is a simple function of AWS service usage. But even though cost reflects usage, volume based discounts reduce cost while not reducing associated emissions. Also, the pricing structure of certain services does not reflect every aspect of resource usage – consider data transfer pricing and how there is no charge for inbound data transfer across all services in all regions or how data transfer charges don’t differ depending on end customer proximity. AWS service usage in turn depends on and is used by a customer’s operational processes to fulfill business needs, completing the data flow on the left. All of this comes back to efficiency and using the least amount of resources to fulfill business needs.

Normalize metrics to allow for comparison

We sometimes see customers counting the number of Amazon EC2 instances, or the amount of instance hours to quantify resource consumption. These metrics do not help to compare applications, identify top contributors to consumption or spot trends. Some applications run instances only for minutes before termination. Others run a single instance for a whole month. In a similar way the instance size matters. Instead of just using instance hours, you have to factor in the amount of vCPUs of an instance. We call this normalization.

There are many ways to normalize:

Normalize resource usage: Use the information about the instance type and multiply the instance hours with the number of vCPUs. Alternatively, take normalization factors into account such as those used by Amazon EC2 Reserved Instances. The same applies to other services like Amazon S3 or Amazon EBS in which you take the GB hours instead.
For KPIs, calculate the ratio of desired usage in relation to total usage. That’s already the case with CPU utilization. If your goal is Amazon EC2 Spot adoption, then that is all spot hours divided by all vCPU hours. And if it’s AWS Graviton adoption, then it is all Graviton vCPU hours divided by the total vCPU hours. You define a minimum target percentage for your application teams for this type of KPI.
Use a scoring system to weight services and features differently and incentivize application teams to use resource-efficient services. For example, weight the Amazon S3 Standard storage class higher than Amazon S3 Intelligent-Tiering, by applying a multiplier, as the service description of the latter provides flexibility to AWS to optimize for using less energy and less hardware to provide the service. The goal for application teams is to drive down the weighted usage.
Resource efficiency is using the least amount of resources to fulfill business needs. Your KPIs or metrics have to factor this in by normalizing the resource usage by business unit metrics. We will dive deeper into this in the next section.

Normalize by business metrics

An increase in resource usage is not a cause for alarm when your business grows, but a steady consumption at dropping customer demand is. Factoring in business metrics in your KPIs helps to track and communicate efficiency over time. Business metrics are specific to the purpose of a workload. Examples include the number of monthly active users, or insurance policies managed, or successful calls to an API. You divide your resource usage by a business metric (read this user guide “Evaluate specific improvements“) to calculate a sustainability KPI, like vCPU hours per transaction, as depicted in the equation below. Ideally, you want to see your sustainability KPIs go down or at least, stay on level. You will find the related concept of unit metrics for cost in the blog post “choose, create, and track your unit metrics for your applications“.

Figure 2. sustainability proxy metrics equation

In AWS re:Invent 2022 Build a cost-, energy-, and resource-efficient compute environment (CMP204) (watch the recording) Arm – a global semiconductor industry leader – present how they measured, tracked, and reduced the impact of Electronic Design Automation (EDA) jobs. They used Amazon EC2 instances’ vCPU hours to calculate KPIs for Amazon EC2 Spot adoption, AWS Graviton adoption, and the resources needed per job.

Similarly, Amazon Prime Video explain in AWS re:Invent 2022 Architecting sustainably and reducing your AWS carbon footprint (SUS205) (watch the recording) how they used the following sustainability proxy metrics to quantify and track the effectiveness of optimizations:

Playback experience: Infrastructure cost ($) per 1.000 concurrent streams
Content delivery: Delivery bandwidth (Gbps) per stream
Content discovery experience: Normalized Instance Hour (NIH) per 1000 page impressions
Customer acquisition: Infrastructure cost ($) per subscription

Optimizing towards their goals, Prime Video implemented tradeoffs between sustainability goals and other non-functional requirements. To match the provisioning of resources to the spikey demand from viewers of “Thursday Night Football” they implemented automated contingency switches that turn off non-critical customer experience features if the system was under duress.

Conclusion

In this post we’ve covered the motivation for sustainability proxy metrics and KPIs. We explained the concept of usage based metrics, normalization and inclusion of business metrics, and shared examples on how customers are using these metrics to optimize for sustainability.

In Part II of this blog post series, we’ll dive deeper into how you set up a proxy metrics data pipeline to establish a sustainability proxy metrics showback mechanism for teams wanting to optimize their AWS usage for sustainability and for organizational adoption of efficiency best practices at scale.

For detailed information on how to optimize your workloads for sustainability, please refer to the AWS Well Architected Sustainability Pillar. If you’re interested to start measuring and optimizing sustainability proxy metrics for your applications, please locate the “Sustainability Proxy Metrics Dashboard” and implement today.

AWS Cloud Financial Management

Measure and track cloud efficiency with sustainability proxy metrics, Part I: What are proxy metrics?

Optimize your workloads with proxy metrics

Normalize metrics to allow for comparison

Normalize by business metrics

Conclusion