Querying your AWS Cost and Usage Report using Amazon Athena
The AWS Cost & Usage Report contains the most comprehensive information available on your costs and usage. To extract value quickly from your AWS Cost & Usage Report, query the underlying data source using Amazon Athena.
The Athena serverless interactive query service directly queries the information in your Amazon S3 bucket, and analyzes it using standard SQL. This includes additional metadata relating to pricing, products, and reservations. The best part is that you don’t have to set up or maintain any infrastructure.
This post gives you an overview of how you can quickly and easily enable, configure, and query your AWS cost and usage information using Athena.
Getting started with the AWS Cost & Usage Report
Enable the AWS Cost & Usage Report directly from the Cost & Usage Reports page in the console. Alternately, you can create the report configuration programmatically through the Report Preferences API. For more information, see Creating an AWS Cost & Usage Report.
During the report configuration process, I recommend creating a dedicated S3 bucket for storing your reports, as you may want to adjust the access permissions in the future. AWS is seeing a growing interest across finance, engineering, and operations groups in accessing underlying cost and usage data. Those groups may have the following needs:
- Creating custom cost applications and reports
- Generating granular forecasts
- Performing deep dives into specific usage patterns
Creating a dedicated S3 bucket now for your Cost & Usage Reporting saves you time later, for controlling access, implementing custom reporting use cases, or avoiding changes to your existing file setup.
To configure Athena integration during the report configuration process, make sure that you select the check the box next to it for the Enable report data integration for option.
Selecting this option signals to AWS that you want to output your cost and usage data in Apache Parquet format, which is automatically partitioned into year and month folders in S3. Since Athena is priced per query and charges are based on the amount of data scanned by the query, this allows you to optimize query performance and cost. Because you scan less data, having the AWS Cost & Usage Report data in Apache Parquet format (which has column-based compression) saves from 30% to 90% on per-query costs. For more information, see the Athena pricing page.
Configuring Athena data integration
After your AWS Cost & Usage Report is enabled, use a standard AWS CloudFormation template to perform a one-time configuration of an AWS Glue crawler. This makes sure that your latest cost and usage information is always available to Athena—with no additional work required to prepare your data for analysis.
For more information, see a step-by-step guide to setting up your Athena data integration.
Querying your AWS Cost & Usage Report using Athena
After you create your stack, navigate to the Athena console. When you configure your dataset correctly, your AWS Cost & Usage Report appears in the left navigation pane under Database.
Use the following command to query your total spend by product for 2019:
sum(line_item_unblended_cost) AS cost
FROM <YOUR TABLE NAME>
GROUP BY line_item_product_code
HAVING sum(line_item_unblended_cost) > 0
ORDER BY line_item_product_code;
From there, perform whichever queries you like against any data field available in the AWS Cost & Usage Report. For more information about the items and columns in the report, see Cost & Usage Report Details.
Athena allows you to quickly and easily query your AWS Cost & Usage Report, making it more cost-effective (and easier) than ever to see your underlying usage patterns and cost drivers. Furthermore, the standard AWS service integrations automatically make it so that you’re always querying your latest available cost and usage data. You can access a guided tutorial here.