AWS Cloud Financial Management
How and why you should move to Cost and Usage Report (CUR) 2.0?
Cost & Usage Report (CUR) 2.0 was released at re:Invent 2023, as part of the launch of AWS Billing and Cost Management Data Exports (read this blog). We want to show you the benefits of CUR 2.0 and provide steps on how to migrate to CUR 2.0, so you don’t lose out! CUR 2.0 builds upon the Legacy CUR, while offering several key improvements for your cost tracking. Both 2.0 and Legacy allow you to analyze AWS costs in greater detail and accuracy, especially by including Resource IDs and hourly time granularity.
Why should you move to CUR 2.0?
- Consistent schema: Contains a fixed set of columns to facilitate data ingestion. This change will lower changes to the schema and to ensure smooth data usage.
- Additional columns: Two additional columns “bill_payer_account_name” and “line_item_usage_account_name” have been added. This change makes the data more user-friendly as you can now attribute costs based on names rather than trying to remember account ID’s.
- Nested data: CUR 2.0 nests certain columns together. They are now presented as key-value pairs in single columns, reducing data sparsity and improving schema consistency. Collapsed columns: resource_tags, cost_category, product, discount.
- SQL capabilities: You can now select specific columns when creating an export and even edit these selections after creation. It offers more flexibility, allowing teams to create a smaller dataset with required columns to fit business needs.
- Compatibility with Legacy CUR: Importantly, CUR 2.0 maintains backwards compatibility with the legacy CUR. You can write SQL queries to generate a schema that matches the legacy format, easing the transition with existing data pipelines and workflows.
How to move to CUR 2.0?
When moving to CUR 2.0, you have a 3 options depending on how you have setup Legacy CUR. Please also note there are some permissions that will be needed to be added to your role which can be found here in Billing and Cost Management Data Exports access guide.
Option 1: Create a new CUR 2.0
As mentioned earlier, CUR 2.0 nests certain columns from Legacy CUR together. This results in 4 “map” or json data type columns: resource_tags
, cost_category
, product
, discount
). If you want to create a CUR 2.0, the best process is:
- Create an export of CUR 2.0 with its new schema. You can create the CUR 2.0 either through the console or using the APIs (steps below).
- After creating the CUR 2.0, you can get it backfilled with up to 36 months of historical data by creating a customer support ticket and specifying: you account ID, the export name, and the number of months to backfill.
- If you are moving to CUR 2.0 from the legacy CUR, you may need to either (a) update some of your SQL queries to use the nested columns they previously used the un-nested columns. Or (b) un-nest these columns as part of your ETL process to have these columns be accessible for your SQL queries.
Creating a new export via Data Exports APIs
If you wish to make your new CUR through the CLI here are the basic steps:
- Determine your export SQL query and table configurations – See this document for valid SQL in Data Exports and the Table Dictionary.
- Prepare an S3 bucket and give the proper IAM permissions – Create this bucket and add this policy to it.
- Prepare your “Export” json object that defines export settings – You can find the general structure of the Export object here.
- Write export object into text file in CLI and save. – Save your export json file as “export.json” with nano in the CloudShell CLI.
- Call the API with your export object. –
aws bcm-data-exports create-export --export file://export.json
Option 2: Replicate your Legacy CUR schema in CUR 2.0 due to data pipeline dependencies
If you were using Legacy CUR, your existing data pipeline and SQL queries may need to be adjusted during migration. You can accomplish this in Data Exports by writing a SQL query. When defining your export, you can extract the keys in the collapsed columns as separate columns. This way you can get a fixed export schema output, but have it look just like your Legacy CUR. There are two ways to do this:
- You can use this sample code on AWS’s GitHub to automatically generate the SQL query in Data Exports that matches an existing Legacy CUR Parquet that you have.
- You can manually write the SQL query using the Data Exports APIs to define the CUR 2.0 schema that you want to receive. Using the dot-operator, you can extract the collapsed columns to be received as separate columns in your export. (Column extraction documentation here and documentation of SQL supported in Data Exports here.)
Option 3: Use the Cloud Intelligence Dashboards (CIDs)
If you have deployed your CUR using the CIDs, then don’t worry we now can help you migrate to 2.0! There is a full deployment guide on how to migrate to CUR 2.0. You will deploy the new 2.0 version and the update your dashboard to run off that file. New customer deploying the CID foundational dashboards will be using CUR 2.0 and for existing customers there is a migration guide for CUR legacy to CUR 2.0.
Forward Compatibility Layer (Optional)
However, in case if you cannot migrate from Legacy CUR right now, you can install a proxy view using the CID CLI that will provide CUR2 interface to your Legacy CUR data.
cid-cmd create-cur-proxy --cur-version 2
Backward Compatibility Layer (Optional)
If you have CUR 2.0 data but you have a tool that requires Amazon Athena access to Legacy CUR, you can install a CUR1 proxy for Backward Compatibility.
cid-cmd create-cur-proxy —cur-version 1
How to update my queries to CUR 2.0?
To transform your Legacy CUR queries to CUR2.0 format, we have created this command line tool query-converter using Amazon Bedrock. This tool uses ‘claude-3-sonnet’ model extended with additional awareness about the specifics of CUR1 to CUR2.0 transformation.
Legacy CUR:
select sum(line_item_unblended_cost) as cost, product_product_name
from cur
where month = '8' and year ='2024'
group by product_product_name
order by cost
CUR 2.0
select sum(line_item_unblended_cost) as cost, product['product_name']
where SPLIT(billing_period,'-')[2] = '08’
group by 2
order by cost
As you can see, there are two main differences between the two queries. Firstly product_product_name is now pulled from a key value pair. Next, there is no month and year data anymore because the partition has changed to billing_period.
Summary
Now you know how to move to CUR 2.0 and ensure your usage of this data set is clear, we strongly encourage you to update. Don’t miss out on these and future updates! See a demo of CUR 2.0 here.