AWS Cloud Operations Blog
Cost optimization in AWS using Amazon CloudWatch metric streams, AWS Cost and Usage Reports and Amazon Athena
You can use metric streams to create continuous, near-real-time streams of Amazon CloudWatch metrics to a destination of your choice. Metric streams make it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream that includes the most up-to-date CloudWatch metrics to power dashboards, alarms, and other tools that rely on accurate and timely metric data.
You can use metric streams to send metrics to partner solutions, including Datadog, Dynatrace, New Relic, Splunk, and Sumo Logic. Alternatively, you can send metrics to your data lake built on AWS, such as on Amazon Simple Storage Service (Amazon S3). You can continuously ingest monitoring data and combine billing and performance data with the latest CloudWatch metric data to create rich datasets. You can then use Amazon Athena to get insights into cost optimization, resource performance, and resource utilization. Metric streams are fully managed, scalable, and easy to set up.
In this post, I will show you how to store data from metric streams in an S3 bucket and then use Amazon Athena to analyze metrics for Amazon Elastic Compute Cloud (Amazon EC2). I will also show you how to look for opportunities to correlate the EC2 metric data with the AWS Cost and Usage Report.
Figure 1 shows the architecture for the solution I discuss in this post.
Figure 1: Solution architecture
The workflow includes the following steps:
- Amazon CloudWatch metrics data is streamed to a Kinesis Data Firehose data stream. The data is then sent to an S3 bucket.
- AWS Cost and Usage Reports publish the AWS Billing reports to an S3 bucket.
- AWS Glue crawlers are used to discover the schema for both datasets.
- Amazon Athena is used to query the data for metric streams and AWS Cost and Usage Reports.
- (Optional) Amazon QuickSight is used to build dashboards.
Prerequisite
Enable AWS Cost and Usage Report for your AWS account.
Walkthrough
Create a metric stream (All metrics)
To get started, in the left navigation pane of the Amazon CloudWatch console, expand Metrics, choose Streams, and then choose Create metric stream button.
Alternatively, you can use the CloudWatch API, AWS SDK, AWS CLI, or AWS CloudFormation to provision and configure metric streams. Metric streams support OpenTelemetry and JSON output formats.
Figure 2: CloudWatch metric streams in the CloudWatch console
By default, metric streams send data from all metrics in the AWS account to the configured destination. You can use filters to limit the metrics that are being streamed. On the Create a metric stream page, leave the default selections.
Figure 3: Create a metric stream
A unique name will be generated for the stream. The console will also create an S3 bucket with a unique name to store the metrics, an AWS Identity and Access Management (IAM) role for writing to S3, and a Kinesis Data Firehose data stream to stream metrics to S3.
Figure 4: Resources to be added to the account
Choose Create metric stream.
Figure 5: Metric streams tab
Create a metric stream (Selected namespaces)
You can also create a metric stream to capture CloudWatch metrics specific to a service like Amazon EC2. On the Create a metric stream page, under Metrics to be streamed, choose Selected namespaces and then select EC2, as shown in Figure 6:
Figure 6: Creating a metric stream for EC2 metrics only
You now have two metric streams: one that has data for all metrics and one that has data for EC2 metrics only.
Figure 7: Metric streams displayed in the console
Set up AWS Glue and Amazon Athena
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities required for data integration so that you can start analyzing your data and putting it to use in minutes.
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. With just a few actions in the AWS console, you can point Athena at your data stored in Amazon S3 and start using standard SQL to run ad-hoc queries and get results in seconds. Athena natively supports querying datasets and data sources that are registered with the AWS Glue Data Catalog.
You will use AWS Glue to connect to your metric data sources in S3.
- In the AWS Glue console, choose Crawlers, and then choose Add crawler.
- For Crawler name, enter
metric-stream-full
. - On Specify crawler source type, leave the default options.
- On Add a data store, for Choose a data store, choose S3. Include the path to the bucket that has data for all metrics.
- On Add another data store, choose No.
- Create an IAM role that will be used by AWS Glue. Make sure that the IAM role has access to the AWSGlueServiceRole managed policy and the S3 bucket.
- Create a schedule with an hourly frequency.
- On Configure the crawler’s output, for Database, choose default.
- Create the crawler, and then choose Run crawler.
When you’re done, you can look at the discovered table schema from the AWS Glue Data Catalog. Make a note of the location of the S3 bucket that has metrics from all namespaces.
Figure 8: Table details in the AWS Glue Data Catalog
Now create a table in the AWS Glue Data Catalog for EC2 metrics. Repeat steps 1-9. In step 4, make sure you use the S3 bucket location for EC2 metrics.
Figure 9: Table details in the AWS Glue Data Catalog
For both AWS Glue Data Catalog tables, the timestamp column is recognized as bigint
by default. Later in the post, you’ll write Athena queries. To make that task easier, you can manually change the data type to timestamp
. For each AWS Glue Data Catalog table, choose Edit schema and change the timestamp column to the timestamp
data type.
Figure 10: Table definition and schema details
Edit and run the crawlers again to update all the partitions with the new schema. Under Configuration options, choose Ignore the change and don’t update the table in the data catalog and then select the Update all new and existing partitions with metadata from the table checkbox, as shown in Figure 11.
Figure 11: Configure the crawler’s output
Run the AWS Glue crawler again with these settings. You’re now ready to start analyzing your metric streams data with Amazon Athena.
Run queries in Athena
Open the Athena console and start running queries on metric streams data.
Figure 12: Previewing data in metric streams table using Amazon Athena
Query 1: Find average and max CPU utilization for a given instance ID
Figure 13: Using Athena to find average and max CPU utilization of an EC2 instance
Query 2: Find average CPU utilization with data aggregated across five-minute intervals
Figure 14: Using Athena to find average CPU utilization of an EC2 instance aggregated over five minutes
Query 3: Find average CPU utilization of all EC2 instances in an AWS Region. Run this query against the Athena table with EC2-only metrics.
Figure 15: Average CPU utilization of all EC2 instances in a Region
Correlate with AWS Cost and Usage Reports
The AWS Cost and Usage Report contains the most comprehensive information available on your costs and usage. For information about how you can quickly and easily enable, configure, and query your AWS cost and usage information using Athena, see the Querying your AWS Cost and Usage Report using Amazon Athena blog post and Querying Cost and Usage Reports using Amazon Athena in the AWS Cost and Usage Report User Guide.
Query 4: Find EC2 instances running in your account in the us-east-2 Region and the amount you’re paying on an hourly basis
When you run this query in Athena, you see multiple t3.small instances are running every hour in us-east-2. Note the line_item_blended_cost (0.0208 USD/hour).
Figure 16: Analyze EC2 hourly charges using Athena
You can aggregate the cost data from this query across all EC2 instances running in a Region and compare it with the average CPU utilization of EC2 instances.
Query 5: Aggregate cost data across all EC2 instances and compare it with average CPU utilization of instances
From the result of query, you can see that you’re spending roughly 0.25 USD every hour on EC2 instances with an average CPU utilization across all instances of approximately 2%.
Figure 17: Use Athena to compare EC2 resource utilization and costs
Because overall CPU utilization for all EC2 instances is fairly low at 2%, there might be an opportunity for you to right-size some of the instances.
Query 6: Find average and max CPU utilization of each instance running in this Region
Figure 18: Average and max CPU utilization for all instances in a Region
You can look at this query result to find out which of the EC2 instances need to be downsized. Some of the EC2 instances in this example aren’t being used to capacity, which means you can use smaller-sized instances instead. You can also do ORDER BY CPU utilization instead of datetime, which is shown in Figure 18.
After you have completed the right-sizing, you can execute Query 5 to see if the hourly cost numbers are trending down. In Figure 19, the cost column shows a decrease in overall spend.
Figure 19: Use Athena to compare EC2 resource utilization and costs
Cleanup
To avoid ongoing charges to your account, delete the resources you created in this walkthrough.
- Delete the metric streams, the corresponding Kinesis Data Firehose data streams, and the IAM role.
- Delete the AWS Glue crawlers and tables in the AWS Glue Data Catalog.
- Delete the S3 buckets where the metric data is stored.
Conclusion
In this post, I showed you how to use metric streams to export CloudWatch metrics to S3. You used Athena to look at metrics like EC2 CPU utilization. You learned how to use Athena to combine data from the AWS Cost and Utilization Report and metric streams to examine historical trends and perform right-sizing and cost optimization. You might want to use Amazon QuickSight to create dashboards from queries you have written in Athena.
Metric streams can help you run reports on other AWS services, including Amazon RDS, Amazon Elastic Block Store, S3, and AWS Lambda.