Cost optimization in AWS using Amazon CloudWatch metric streams, AWS Cost and Usage Reports and Amazon Athena

You can use metric streams to create continuous, near-real-time streams of Amazon CloudWatch metrics to a destination of your choice. Metric streams make it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream that includes the most up-to-date CloudWatch metrics to power dashboards, alarms, and other tools that rely on accurate and timely metric data.

You can use metric streams to send metrics to partner solutions, including Datadog, Dynatrace, New Relic, Splunk, and Sumo Logic. Alternatively, you can send metrics to your data lake built on AWS, such as on Amazon Simple Storage Service (Amazon S3). You can continuously ingest monitoring data and combine billing and performance data with the latest CloudWatch metric data to create rich datasets. You can then use Amazon Athena to get insights into cost optimization, resource performance, and resource utilization. Metric streams are fully managed, scalable, and easy to set up.

In this post, I will show you how to store data from metric streams in an S3 bucket and then use Amazon Athena to analyze metrics for Amazon Elastic Compute Cloud (Amazon EC2). I will also show you how to look for opportunities to correlate the EC2 metric data with the AWS Cost and Usage Report.

Figure 1 shows the architecture for the solution I discuss in this post.

Diagram shows interaction between EC2, EBS, RDS, CloudWatch, Amazon Kinesis Data Firehose, S3, AWS Glue crawler, Athena, QuickSight, and more.

Figure 1: Solution architecture

The workflow includes the following steps:

Amazon CloudWatch metrics data is streamed to a Kinesis Data Firehose data stream. The data is then sent to an S3 bucket.
AWS Cost and Usage Reports publish the AWS Billing reports to an S3 bucket.
AWS Glue crawlers are used to discover the schema for both datasets.
Amazon Athena is used to query the data for metric streams and AWS Cost and Usage Reports.
(Optional) Amazon QuickSight is used to build dashboards.

Prerequisite

Enable AWS Cost and Usage Report for your AWS account.

Walkthrough

Create a metric stream (All metrics)

To get started, in the left navigation pane of the Amazon CloudWatch console, expand Metrics, choose Streams, and then choose Create metric stream button.

Alternatively, you can use the CloudWatch API, AWS SDK, AWS CLI, or AWS CloudFormation to provision and configure metric streams. Metric streams support OpenTelemetry and JSON output formats.

In the left navigation pane, Metrics is expanded to display entries for Explorer and Streams. The Streams page displays a Create metric stream button.

Figure 2: CloudWatch metric streams in the CloudWatch console

By default, metric streams send data from all metrics in the AWS account to the configured destination. You can use filters to limit the metrics that are being streamed. On the Create a metric stream page, leave the default selections.

Create a stream page, under Select namespaces you wish to stream, All metrics is selected. Under Select configuration option, Quick S3 setup is selected.

Figure 3: Create a metric stream

A unique name will be generated for the stream. The console will also create an S3 bucket with a unique name to store the metrics, an AWS Identity and Access Management (IAM) role for writing to S3, and a Kinesis Data Firehose data stream to stream metrics to S3.

Custom metric stream name includes a field for the metric stream name. It displays a message that says an S3 bucket, S3 write role, Firehose write role, and Firehose will be added to the account.

Figure 4: Resources to be added to the account

Choose Create metric stream.

The stream has a status of Running and a destination of S3.

Figure 5: Metric streams tab

Create a metric stream (Selected namespaces)

You can also create a metric stream to capture CloudWatch metrics specific to a service like Amazon EC2. On the Create a metric stream page, under Metrics to be streamed, choose Selected namespaces and then select EC2, as shown in Figure 6:

Under Metrics to be streamed, Selected namespaces is selected. Under Select namespaces, EC2 is selected in the list.

Figure 6: Creating a metric stream for EC2 metrics only

You now have two metric streams: one that has data for all metrics and one that has data for EC2 metrics only.

The two metric streams are displayed in the list. Both have a status of Running and a destination of S3.

Figure 7: Metric streams displayed in the console

Set up AWS Glue and Amazon Athena

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities required for data integration so that you can start analyzing your data and putting it to use in minutes.

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. With just a few actions in the AWS console, you can point Athena at your data stored in Amazon S3 and start using standard SQL to run ad-hoc queries and get results in seconds. Athena natively supports querying datasets and data sources that are registered with the AWS Glue Data Catalog.

You will use AWS Glue to connect to your metric data sources in S3.

In the AWS Glue console, choose Crawlers, and then choose Add crawler.
For Crawler name, enter metric-stream-full.
On Specify crawler source type, leave the default options.
On Add a data store, for Choose a data store, choose S3. Include the path to the bucket that has data for all metrics.
On Add another data store, choose No.
Create an IAM role that will be used by AWS Glue. Make sure that the IAM role has access to the AWSGlueServiceRole managed policy and the S3 bucket.
Create a schedule with an hourly frequency.
On Configure the crawler’s output, for Database, choose default.
Create the crawler, and then choose Run crawler.

When you’re done, you can look at the discovered table schema from the AWS Glue Data Catalog. Make a note of the location of the S3 bucket that has metrics from all namespaces.

Table details include name, description, database (in this example, default), classification (json), location, and more.

Figure 8: Table details in the AWS Glue Data Catalog

Now create a table in the AWS Glue Data Catalog for EC2 metrics. Repeat steps 1-9. In step 4, make sure you use the S3 bucket location for EC2 metrics.

Table details include name, description, database (in this example, default), classification (json), location, and more.

Figure 9: Table details in the AWS Glue Data Catalog

For both AWS Glue Data Catalog tables, the timestamp column is recognized as bigint by default. Later in the post, you’ll write Athena queries. To make that task easier, you can manually change the data type to timestamp. For each AWS Glue Data Catalog table, choose Edit schema and change the timestamp column to the timestamp data type.

Table displays columns for column name, data type, partition key, and comment.

Figure 10: Table definition and schema details

Edit and run the crawlers again to update all the partitions with the new schema. Under Configuration options, choose Ignore the change and don’t update the table in the data catalog and then select the Update all new and existing partitions with metadata from the table checkbox, as shown in Figure 11.

Under Database, default is displayed. Under Configuration options, the selections are made as described in the procedure.

Figure 11: Configure the crawler’s output

Run the AWS Glue crawler again with these settings. You’re now ready to start analyzing your metric streams data with Amazon Athena.

Run queries in Athena

Open the Athena console and start running queries on metric streams data.

SELECT * FROM "default"."metricstreams_quickfull_l3bbbk_9mx9etry" limit 10;

Under Results, there are columns for metric stream name, account ID, Region, namespace, metric name, and dimensions.

Figure 12: Previewing data in metric streams table using Amazon Athena

Query 1: Find average and max CPU utilization for a given instance ID

SELECT timestamp as datetime,
         value.sum/value.count AS average, value.max
FROM "default"."metricstreams_quickfull_l3bbbk_9mx9etry"
WHERE namespace='AWS/EC2'
        AND metric_name='CPUUtilization'
        AND dimensions.instanceid='i-02b5fc950b4231902'
ORDER BY  datetime DESC limit 1000

The columns under Results are datetime, average, and max.

Figure 13: Using Athena to find average and max CPU utilization of an EC2 instance

Query 2: Find average CPU utilization with data aggregated across five-minute intervals

SELECT min(timestamp) as timestamp,
         avg(value.sum/value.count) AS average
FROM "default"."metricstreams_quickfull_l3bbbk_9mx9etry"
WHERE namespace='AWS/EC2'
        AND metric_name='CPUUtilization'
        AND dimensions.instanceid='i-02b5fc950b4231902'
        GROUP BY floor(to_unixtime(timestamp)/60/5)
        order by timestamp desc limit 1000

The columns under Results are timestamp and average.

Figure 14: Using Athena to find average CPU utilization of an EC2 instance aggregated over five minutes

Query 3: Find average CPU utilization of all EC2 instances in an AWS Region. Run this query against the Athena table with EC2-only metrics.

SELECT min(timestamp) as timestamp,
avg(value.sum/value.count) AS average
FROM "default"."metricstreams_quickpartial_354bg3_qo9wgvlr"
WHERE namespace='AWS/EC2'
AND metric_name='CPUUtilization'
AND dimensions.instanceid is Null
AND dimensions.imageid is Null
AND dimensions.autoscalinggroupname is Null
AND dimensions.instancetype is Null
GROUP BY floor(to_unixtime(timestamp)/60/5)
ORDER BY timestamp DESC

The columns under Results are timestamp and average.

Figure 15: Average CPU utilization of all EC2 instances in a Region

Correlate with AWS Cost and Usage Reports

The AWS Cost and Usage Report contains the most comprehensive information available on your costs and usage. For information about how you can quickly and easily enable, configure, and query your AWS cost and usage information using Athena, see the Querying your AWS Cost and Usage Report using Amazon Athena blog post and Querying Cost and Usage Reports using Amazon Athena in the AWS Cost and Usage Report User Guide.

Query 4: Find EC2 instances running in your account in the us-east-2 Region and the amount you’re paying on an hourly basis

SELECT line_item_product_code,
         line_item_usage_type,
         line_item_operation,
         line_item_blended_cost,
         product_region,
         line_item_currency_code,
         year,
         lpad(month,
        2,
        '0') AS month, lpad(cast(day_of_month(line_item_usage_start_date) as varchar),2,'0') AS day, lpad(cast(hour(line_item_usage_start_date) as varchar),2,'0') AS hour
FROM "athenacurcfn_cur_report_test"."cur_report_test" where line_item_product_code='AmazonEC2' and product_region='us-east-2' and line_item_usage_type like '%BoxUsage%' order by year desc, month desc, day desc, hour desc;

When you run this query in Athena, you see multiple t3.small instances are running every hour in us-east-2. Note the line_item_blended_cost (0.0208 USD/hour).

The columns under Results are line_item_product_code, line_item_usage_type, line_item_operation, line_item_blended_cost, product_region, line_item_currency_code, year, month, day, and hour.

Figure 16: Analyze EC2 hourly charges using Athena

You can aggregate the cost data from this query across all EC2 instances running in a Region and compare it with the average CPU utilization of EC2 instances.

Query 5: Aggregate cost data across all EC2 instances and compare it with average CPU utilization of instances

SELECT a.year,
         a.month,
         a.day,
         a.hour,
         a.cost,
         b.metric_name,
         b.unit,
         b.average_cpu_utilization
FROM 
    (SELECT year,
         month,
         day,
         hour,
         sum(line_item_blended_cost) AS cost
    FROM 
        (SELECT line_item_blended_cost,
         year,
         lpad(month,
         2,
         '0') AS month, lpad(cast(day_of_month(line_item_usage_start_date) AS varchar),2,'0') AS day, lpad(cast(hour(line_item_usage_start_date) AS varchar),2,'0') AS hour
        FROM "athenacurcfn_cur_report_test"."cur_report_test"
        WHERE line_item_product_code='AmazonEC2'
                AND product_region='us-east-2'
                AND line_item_usage_type LIKE '%BoxUsage%')
        GROUP BY  year, month, day, hour) AS a
    LEFT JOIN 
    (SELECT metric_name,
         unit,
         partition_0 AS year,
         partition_1 AS month,
         partition_2 AS day,
         partition_3 AS hour,
         avg(value.sum/value.count) AS average_cpu_utilization
    FROM "default"."metricstreams_quickpartial_354bg3_qo9wgvlr"
    WHERE metric_name='CPUUtilization'
            AND dimensions.instanceid is Null
            AND dimensions.imageid is Null
            AND dimensions.autoscalinggroupname is Null
            AND dimensions.instancetype is Null
    GROUP BY  metric_name, unit, partition_0, partition_1, partition_2, partition_3) AS b
    ON a.year = b.year
        AND a.month = b.month
        AND a.day = b.day
        AND a.hour = b.hour
WHERE a.year = '2021'
        AND a.month = '03'
ORDER BY  day desc, hour desc;

From the result of query, you can see that you’re spending roughly 0.25 USD every hour on EC2 instances with an average CPU utilization across all instances of approximately 2%.

The columns under Results are year, month, day, hour, cost, metric_name, unit, average_cpu_utilization.

Figure 17: Use Athena to compare EC2 resource utilization and costs

Because overall CPU utilization for all EC2 instances is fairly low at 2%, there might be an opportunity for you to right-size some of the instances.

Query 6: Find average and max CPU utilization of each instance running in this Region

SELECT metric_name,
dimensions.instanceid instanceid,
value.max max,
value.sum/value.count AS average_cpu,
timestamp AS datetime
FROM "default"."metricstreams_quickpartial_354bg3_qo9wgvlr"
WHERE metric_name='CPUUtilization'
AND dimensions.instanceid is NOT Null
ORDER BY datetime desc

The columns under Results are metric_name, instanceid, max, and average_cpu.

Figure 18: Average and max CPU utilization for all instances in a Region

You can look at this query result to find out which of the EC2 instances need to be downsized. Some of the EC2 instances in this example aren’t being used to capacity, which means you can use smaller-sized instances instead. You can also do ORDER BY CPU utilization instead of datetime, which is shown in Figure 18.

After you have completed the right-sizing, you can execute Query 5 to see if the hourly cost numbers are trending down. In Figure 19, the cost column shows a decrease in overall spend.

The cost column shows a decrease in overall spend.

Figure 19: Use Athena to compare EC2 resource utilization and costs

Cleanup

To avoid ongoing charges to your account, delete the resources you created in this walkthrough.

Delete the metric streams, the corresponding Kinesis Data Firehose data streams, and the IAM role.
Delete the AWS Glue crawlers and tables in the AWS Glue Data Catalog.
Delete the S3 buckets where the metric data is stored.

Conclusion

In this post, I showed you how to use metric streams to export CloudWatch metrics to S3. You used Athena to look at metrics like EC2 CPU utilization. You learned how to use Athena to combine data from the AWS Cost and Utilization Report and metric streams to examine historical trends and perform right-sizing and cost optimization. You might want to use Amazon QuickSight to create dashboards from queries you have written in Athena.

Metric streams can help you run reports on other AWS services, including Amazon RDS, Amazon Elastic Block Store, S3, and AWS Lambda.

AWS Cloud Operations Blog