Choose, create, and track your unit metrics for your applications

When you operate in the variable spend model of the Cloud, business growth can translate into a variable bill that reflects the activity of your workloads in your environment. For some customers, a monthly increase in their AWS bill is a normal part of growth, but for many, is an unwanted outcome. Therefore, it is important to utilize the right metrics to assess the economics of your infrastructure.

Hereby, Unit metrics, defined as the ‘cost per business value’, allow you to measure your costs in terms of delivered business value. For example, an e-commerce retail company may want to know ‘cost per customer visit’ or ‘cost per conversion’, in addition to the cumulative bill of the e-commerce solution in cloud, as these two business values are expected to grow. This helps to isolate the effect of the business growth variations from the cost; providing you a clearer, unbiased, and insightful information. Ideally, you may want to see your unit metrics go down or at least, stay the same. You can use unit metrics to gauge how efficiently your team uses technology resources, to forecast your future spending and investment needs, as well as to get buy-in and tell your ‘IT value story’ inside your organization. Unit metrics are also useful vehicles for IT teams to effectively communicate more with their finance departments. While specifics of unit metrics can vary among industries and businesses, the methodology of choosing, calculating, and using unit metrics remains the same. We recommend you to read our blog post series on Unit Cost Metrics to gain further understanding into the subject and learn best practices.

This blog post is a continuation of the unit metrics blog post series, and we now introduce an AWS Solution to help you automate the creation and tracking of unit metrics. This blog post not only presents a solution but also explains the thought process behind choosing meaningful unit metrics, ways of locating data sources, extracting the data, and deploying the dashboard for any organization. In addition to discussing a step-by-step process for building the solution, we also apply the process to a realistic example application, so that you can see how the solution works.

Before Getting Started

Even if this blog post content does not require you to deploy the below items, we recommend you going through the below list of implementations before moving forward with deployment:

CUDOS Dashboard is an in-depth, granular, and recommendation-driven dashboard to help you dive deep into cost and usage. This blog post refers to CUDOS which deploys the final dashboard. Yet, the presented solution here is applicable to any Cloud Financial Management (CFM) or Business Intelligence (BI) deployment as long as Amazon Athena supports.
AWS Organizations is AWS managed service to help you centrally manage your AWS accounts in your organization. You can still utilize this solution for a single AWS account, yet the blog post aims to address organizations made of multitude of AWS accounts.
Resource Tagging is one of the best practices to track and find your resources, you assign metadata to your resource in the form of tag, key-value pairs. We will use Tags to query business related resources and the costs associated.

Our Example Solution: Document Understanding Solution (DUS)

We chose an example application to show how to calculate unit metrics throughout this blog post: Document Understanding Solution (DUS). This application uses AWS AI/ML, compute, databases, storage, messaging, and application services to analyze text documents. We assume the DUS is an API being used by users who get document insights through successful calls, which drives the revenue. You can either deploy this example application as explained here (optional), or follow the instructions and apply the solution to your own application. Please be mindful of the associated costs of this application explained here.

Step one: Selecting meaningful unit metrics

A unit metric is a Key Performance Indicator (KPI) composed of a numerator that quantifies an amount of AWS spend (e.g., ‘Total AWS Cost of Delivering Value’), and a denominator that quantifies a demand driver (e.g., ‘Total Units of X’), where “X” is any business input that drives value for your organization; such that;

Unit Metric = Cost Data / Business Data

The denominator has a strong statistical correlation to the numerator and is a metric easily relatable to business, finance, or end-user functions represented. Selecting the right unit cost metric(s) is the first and the most important step. Hence, engaging business stakeholders and finance teams early in the process help you select the right unit metrics for your business.

For our DUS application example, the endpoint delivers value by converting documents into insights. We chose ‘Number of Critical Documents Analyzed by Specialists” as a meaningful unit metric. You can easily modify this metric to be “per page analyzed” or “per 100 words analyzed.” Additionally, the other unit metric, applicable to many applications, is ‘Cost per Successful API response’. An API based business parameter is more flexible and applicable to other use cases.

Step two: Locating the data sources

The second step is to locate the sources of data providing the input into our unit metrics calculation; namely, data sources for cost and business data.

i) Cost data (numerator)

Your workloads may serve more than one application or deployed across multiple environments such as development, testing and production. Each application or environment may have different accounting requirements. Therefore, it is important to first isolate your true application costs from your total AWS bill. One way is to use the production environment which is generally the main cost driver and delivers the business value to end users.

Process of getting the cost data is not unique for our CUD application. You can extract the cost data of applications deployed on AWS through AWS Cost and Usage Report (AWS CUR), which contains the most comprehensive set of cost and usage data available. The CUR can be delivered to a specific Amazon S3 bucket, hourly, daily or weekly based on your selection. You can employ AWS CUR APIs to create, retrieve or delete the CURs, programmatically. If you have already deployed the CUDOS Dashboard as recommended in the Before Getting Started section, you would already have the CUR report in your S3 bucket. Otherwise, you can create Cost and Usage Report on your own.

ii) Business data (denominator)

Locating the business data may not be as straightforward as the cost data because the sources of business-related data depend on the type of business and application. That’s said, there are two common ways practiced by our customers:

Programmatic Queries can programmatically provide business data. Amazon CloudWatch, databases, datalake, or storage are common sources of data queries for aggregate business information. These resources can be AWS resources (e.g. Amazon RDS, S3, Aurora, DynamoDB, Redshift) or an external resource (e.g. JDBC).
Manual/ad-hoc method may be always the part of the process since people may want to populate these data manually in MS Excel spreadsheets, CSV files, PDF files, presentations, etc.

The above methods are not mutually exclusive; you can use them, simultaneously.

Programmatic queries method: APIs are common ways to query the data via SDKs, AWS CLIs, or the AWS Console. For example, the data can be the number of “active users”, “sessions”, or “customer purchases” which are expected to be available in a database as part of a normal business process.

For our example DUS application, we used Amazon CloudWatch Metrics to query ‘Number of Successful API calls’ made to Amazon API Gateway. We have enabled Amazon API Gateway metrics to track the successful API responses (e.g. with an HTTP 2XX header). If the metric was not available, we would create a custom CloudWatch metric and use the same method. We employed GetMetricData API in Amazon Cloudwatch Insights to cost-effectively query on-demand. This API is especially useful for occasional requests just like in our case.

Our example application did not have it but in some cases some business data may reside in database(s) but not in CloudWatch. In this case, you can use AWS Lambda with Amazon Event Bridge to schedule Lambda invocations and perform programmatic queries with appropriate access. You can refer to AWS Constructs to see code templates for AWS Lambda and data sources built for best practiced deployments.

Manual/ad-hoc method: You may have a specialized team who populates business metrics manually in addition or parallel to the programmatic method. Regardless of the method you use, we recommend extracting business data into a file (or files) with CSV (Comma Separated Values) format, which is row-based and has native integration with commercial spreadsheet software (i.e. “Save As” in MS Excel). We also used a simple table schema with the first column giving the date representing the time of business metric and the other columns giving business values. While we used CSV files and a simple table to represent business metrics in time, you have flexibility to employ your own schema and supported file formats thanks to Glue Crawler capabilities.

For our example DUS application, ‘Number of Critical Documents Analyzed by Specialists’, represents the manual-method. We assumed specialists compile this data after an imaginary manual step and save the data in an excel file(s).

Step three: Deploying the components

Figure 1 shows the architecture of the deployment. This solution has four blocks: CUDOS Dashboard, Data Sources for Business Metrics, Business Data Collection and your application (i.e. DUS for our case).

CUDOS Dashboard is the starting point as explained in the Getting Started section.
Data Sources for Business Metrics has the information/data about the business value during its normal operations (e.g. number of customers, number of transactions, etc.).
Business Data Collection includes AWS Lambda (executor) and Amazon EventBridge (orchestrator) to query business data from data sources for business metrics. We use the same AWS Lambda function to query from the sources, and create/put CSV files into the Amazon S3 bucket from the CUDOS deployment.
Your Application is the application that serves your end users on AWS. For our scenario, it is the DUS application. Please note that, our architecture presents a clear boundary between the data sources for business metrics and the application which does not need to be the case as some or all data can reside in the application.

Figure 1. Solution architecture for the AWS Solution: A Unit Metrics Enhanced CUDOS Dashboard deployment

i) Storing business data onto standardized CSV file(s) on an S3 bucket

You may start with creating an S3 bucket to store CSV files from all business data which includes manually curated csv file (e.g. business-metric-bucket).

If required, you may find a manually curated example business data file available here to download (DocumentBusinessMetricsData.csv). You can manually upload the object file (DocumentBusinessMetricsData.csv) into the S3 bucket you created (business-metric-bucket); or you can use AWS Lambda and AWS EventBridge to upload the business data programmatically and/or automatically. You create and upload CSV files using different schema to experiment.

For the business metrics available in CloudWatch, you can use AWS Lambda function to query your business metric to call GetmetricData API. Since our example application (DUS) used API Gateway, we used the below query to sum all ‘Successful API Calls’ from our ApiName = 'dusstack-dusapi-fqvfbqjcsj2fxh9pu242th'.

SELECT SUM(Count) FROM SCHEMA(AWS/ApiGateway, ApiName, Stage)
WHERE ApiName = 'dusstack-dusapi-fqvfbqjcsj2fxh9pu242th' AND Stage = 'prod'

ii) Extracting CSV data onto Athena tables, using AWS Glue Crawler

The next step is to create the AWS Glue Crawler. Please configure the S3 bucket (business-metric-bucket) as source, and then name the output table ‘unitmetrics-externalsource’. You can make the crawler to be run on-demand, based events or schedules managed by EventBridge. The common practice is to use the scheduled jobs starting the whole process extending to the visualization of the metrics as a part of your business review schedule (monthly, quarterly, weekly, etc.). Yet, you can still run the process on-demand or an event driven manner based on business events like manual upload of a new CSV on the S3 bucket (previous step), etc.

AWS Glue Crawler runs and create the mentioned output table (unitmetrics-externalsource) based on the CSV files uploaded.

iii) Creating an Athena View to consolidate all data

You can find the data sources established by AWS Glue Crawler under Query Editor at Amazon Athena as shown in Figures 2(a) and 2(b) below. Here, you have two data sources, one from Amazon CloudWatch metrics cloudwatchmetrics-csv, and one from external csv files AwsDataCatalog. The first one should show two tables metric_samples and metrics. The other one should show one table unitmetrics_custom.
You can create an Amazon Athena View to consolidate these data sources; logical tables formed by your SQL query.
Once you created the view, you can move to Amazon QuickSight dashboard section below. If you use another BI tool, you can calculate your unit metric in Amazon Athena view to use in your other tool.

(a)

Shows screenshots from Amazon Athena Query Editor.]

(b)

Figure 2. Data sources and tables created by Glue Crawler for (a) CloudWatch metrics and (b) external file (csv file/spreadsheet)

iv) Build the unit metrics dashboard on Amazon QuickSight

At this stage, you have the necessary tables and connectors deployed. Now it is time to connect the data, calculate the unit metrics and build a dashboard on Amazon QuickSight. To perform this step,

Please first follow the instructions given in the Cloud Intelligence Dashboard, Unit Metrics section. When choosing the dataset, select Athena and add dataset using ‘Add Data Set’ option and select unitmetrics_custom Athena view.
Then, please follow the steps of Customize your Dashboard section in the documentation to create the dashboard.

We created two example dashboards as shown in Figure 3.

Shows two example dashboard plots; one is a pie chart with API cost is broken-down into services, the other shows a time-based plot showing reduction of unit cost in time, even if the total cost increases. (a) (b)

Figure 3. (a) Unit cost per 1000 successful API responses; and (b) Timeline graph of total application cost vs. unit cost

Uninstalling Example Application

If you deployed the example DUS application, you can follow the instructions to uninstall the application.

Business interpretation of unit cost metrics results

Using unit metrics gives you useful insights into the efficiency of your operations, architecture, and business.

One of the main uses of unit metrics is to put the total cost into context as shown in Figure 3b. Here you see that ‘Total Cost’ (of the application) and the ‘Unit Cost’ (unit metric ‘Cost of Critical Document Analyzed’), shown together on a timeline chart, where total cost steadily goes up and yet unit cost goes down and starts to converge. This is a common pattern to help explain the fact that total cost is driven by effective business growth (e.g., more documents being processed year after year), as opposed to other factors (e.g., avoidable waste, workload migrations into AWS, etc.); and business owners can demonstrate an increase in efficiency from better cost control practices and economies of scale. As discussed in our blog post series on unit metrics, it is acceptable to see a temporary increase in unit cost as you migrate workloads to the cloud, and/or apply required adaptations to your application (e.g., technical bugs, legal requirements, etc.).

Another good practice is to transform the original unit metric cost (e.g., Cost per Successful API Response, or $0.0021 in our case) into a more ‘user-friendly / human-readable’ version (e.g., cost per 1000 successful API responses) to display the cost breakdown as shown in Figure 3a, in this case a cost of $2.1 per 1000 API calls.

One step further would be to get the ‘unit revenue’ associated with your business drivers (e.g., ‘Revenue per Document Analyzed’ in our case), and with that information you can estimate ‘profits per transaction’, and display on the dashboard.

What’s next

This blog post described in detail how to select, calculate, and deploy a dashboard for unit metrics on AWS, while providing a step-by-step guidance through an example enterprise application.

As a best practice, we recommend business owners to establish a lifecycle management process, by continuously reviewing unit metrics for their applications; either by improving the existing unit metrics (e.g. fine-tuning the numerator with a more accurate cost allocation process, or enhancing the correlation of the denominator with a more suitable demand driver), and/or by incorporating additional ones (e.g. looking at sub-sections of the application, or accounting for new applications altogether). This enables organizations to perform unbiased financial assessments, better understand business profitability, and ultimately unlock value realization for their applications on AWS.

About the authors:

AWS Cloud Operations & Migrations Blog