AWS Cloud Financial Management
Measure and track cloud efficiency with sustainability proxy metrics, Part II: Establish a metrics pipeline
This blog post series provides an overview on how to establish a sustainability proxy metrics showback mechanism for teams wanting to optimize their AWS usage for sustainability. In Part I, we’ve covered the motivation for sustainability proxy metrics and key performance indicators (KPIs). We explained the concept of usage based metrics, normalization and inclusion of business metrics, and shared examples on how customers are using these metrics to optimize for sustainability.
In Part II, we discuss how you set up a proxy metrics data pipeline to establish a sustainability proxy metrics showback mechanism for teams wanting to optimize their AWS usage for sustainability and for organizational adoption of efficiency best practices at scale.
Set up a proxy metrics pipeline
Figure 1 below depicts a conceptual overview of a proxy metrics data pipeline. We assume a multi-account structure with one management account and multiple child accounts, with various workloads running. One central optimization account is dedicated to ingesting, processing and visualizing cost and resource utilization data from all workload accounts.
On AWS, such a multi-account governance strategy is typically implemented using AWS Control Tower and is referred to as an AWS landing zone. This construct uses AWS Organizations to hierarchically structure accounts under organizational units (OUs). The depicted optimization account would typically be part of an organization’s Infrastructure OU while workload accounts would be part of a Workload OU. It is highly recommended to have a Resource Tagging strategy in place – a best practice to consistently categorize and group your resources, by assigning metadata in the form of key-value pairs. This allows to granularly define the scope of resources to be included into a specific metric later on.
With this in place, establishing an initial proxy metrics pipeline includes:
- Creating AWS Cost and Usage Reports (CUR) to be stored in the organization’s management account, including your established user-defined cost allocation tags, and replicated to the central optimization account. CUR is your main data source for detailed consumption information across all AWS services.
- Periodically pushing account and workload-specific metrics from the respective workload OUs to the central optimization account. The goal is to ingest additional data points needed for metrics calculation that are not covered by CUR.
- For Amazon CloudWatch, you capture logs and metrics from multiple source accounts and regions. Check out the AWS prescriptive guidance “Using CloudWatch in centralized or distributed accounts” to learn the different ways to process logs and metrics from multiple accounts.
- AWS Compute Optimizer is a good source of utilization information and rightsizing recommendations. As Compute Optimizer is a regional service, data needs to be collected from multiple regions to the central optimization account. Check out the Compute Optimizer Data Collection lab for detailed instructions.
- Depending on your workload, you need additional metrics. Check out the Data Collection Modules of the corresponding AWS Well-Architected lab to learn how to integrate other metrics sources.
- Ingesting business metrics either via a push or pull mechanism to form a central S3 “metrics lake” bucket in the optimization account. Depending on your intended metrics, source data is located in a data warehouse, database, or monitoring and observability system. To easily aggregate a business metric with AWS resource usage, it must be available in a time series format.
- All data – CUR, AWS metrics, business metrics – in the metrics lake bucket is then catalogued, periodically extracted, cleaned and transformed to be made available for visualization. Processed data sets are time-based, typically consisting of three elements (1) applicable time frame, (2) workload metadata tag or AWS account as discriminator and (3) a metric, such as a business metric, used for normalization as denominator.
- Sustainability proxy metrics and KPIs are displayed in global dashboards which are typically refreshed daily with an hourly data granularity.
Figure 1 shows how the above points relate to specific steps in the proxy metrics pipeline. As a simple step-by-step guide to get started have a look at the AWS Well-Architected lab “Turning Cost & Usage Reports into Efficiency Reports”.
Visualize the data for impact
Communicate sustainability proxy metrics and KPIs to their two main audiences: Management, who wants to understand opportunities for optimization and level of attainment of company goals, and application teams who own their KPIs. It has to be straightforward and accessible to validate earlier optimization assumptions and experimentation, and identify the next best opportunity to optimize. An example is depicted below in Figure 2. To learn more, The Amazon Builders’ Library provides detailed information on building dashboards for operational visibility.
For Compute proxy metrics, start with the following visualizations:
- Visualize overall compute resource usage over different account IDs or workload tags to see the ratio between Amazon EC2, AWS Fargate and AWS Lambda.
- Visualize the compute capacity by EC2 instance type for certain workload tags (or account IDs, if no tagging strategy is in place). This is a popular chart which acts as an indication of where to start optimizations – main contributors are located at the top.
- Imagine further drill downs which show the usage over time for a certain application normalized by this application’s business metric.
- Customers have goals for the adoption of, for example, AWS Graviton or Amazon EC2 Spot instances. Visualize the percentage of the adoption over time.
Establish sustainability proxy metrics in processes
Sustainability proxy metrics dashboards are calculated using AWS Cost and Usage Reports. This data is typically already being used and understood for cost management by the Cloud Center of Excellence (CCoE), FinOps, or cloud cost efficiency teams. Extend established processes for cost showbacks by sustainability proxy metrics so you don’t start from zero and leverage past learnings of what’s working or not for your company. Define a sponsor and product owner for the ongoing development of a showback mechanism. The development is not a one-off activity but requires you to gather and incorporate feedback, and implement new metrics and visuals that work. Remove visuals and data that don’t work but distract, to maximize value of the mechanism.
With over 200 AWS Cloud services you have numerous possibilities to track, visualize, and optimize. Start with services popular in your organization for which you have the largest share of the shared responsibility model for sustainability. Start with Amazon EC2 instances for example, for which you control the instance-size and -family, own the utilization, and manage the efficiency of installed software. Focus on impactful quick wins from the AWS Well-Architected Sustainability Pillar best practices that have an ongoing effect on your resource efficiency like using policies to manage the lifecycle of your datasets, scheduling your build environments as needed, or migrating to managed services. Run a pilot of AWS Well-Architected Framework reviews that focus on the accounts with the highest service consumption.
Make dashboards available to individual application teams to be consumed at their disposal, for example in team meetings or operational reviews. Email reports work as well, but are outdated quickly and lack interactive controls for drill downs. Make a call regarding the visibility of data to individual teams or your whole company.
Celebrate wins and replicate successes with senior leadership visibility. Use sustainability KPIs before and after an optimization measure to communicate successes. Share associated cost benefits with stakeholders that are focused on monetary gains. Complement the numbers by the long term AWS Customer Carbon Footprint Tool data. Celebration of wins, with leadership visibility fosters the awareness and recognition of sustainable behavior. Sharing details of optimizations inspires other teams to replicate successes for their applications.
Conclusion and what to do next
In this post we’ve explained how to establish a proxy metrics data pipeline, effective visuals, and best practices to establish a showback mechanism. If you want to apply this theory in practice and start measuring and optimizing sustainability proxy metrics for your application, you find step by step implementation instructions and a set of best practices in the CUDOS Sustainability Dashboard workshop.
If you are looking for detailed information on how to optimize your workloads for sustainability, check out the AWS Well Architected Sustainability Pillar and the AWS re:Invent 2022 session Delivering sustainable, high-performing architectures (SUS303).