Use Amazon Athena and AWS CloudTrail to estimate billing for AWS Config rule evaluations

AWS Config is a service that enables you to audit your AWS resources for compliance to a desired configuration state. You are billed based on the number of Configuration Items (a point-in-time snapshot of an AWS resource) recorded and the number of AWS Config rules (a function that reports resource compliancy) evaluated per resource per month.

In this post, I’ll show you how to use Amazon Athena to query AWS CloudTrail logs for a detailed billing breakdown of AWS Config rule evaluations. This breakdown can help determine which rules are contributing the most to monthly costs so you can optimize accordingly (for example, by reducing the frequency of the running of certain rules). I don’t dive into configuration items in this post. For more information about that, see the Identifying resources with the most configuration changes using AWS Config blog post.

Step 1: Create an Athena table to query CloudTrail logs

Create an Athena table and configure it with the location and schema of the CloudTrail logs. The easiest way to do this is to use the auto-generated statement available in the CloudTrail console.

From the left navigation pane, choose Event history, and then choose Create Athena table.

The Event history page displays the last 90 days of management events. It includes fields for filtering and Download events and Create Athena table buttons

Figure 1: Event history page in the CloudTrail console

On the Create a table in Amazon Athena page, choose your CloudTrail log. For Storage location, choose the S3 bucket that contains the CloudTrail log files. The table name and table comment placeholders (indicated by the square brackets) will be auto-populated for you.

The ‘Create a table in Amazon Athena’ pop-up menu includes fields for storage location. The Athena table name is generated automatically, but you can rename it in the Athena console.

Figure 2: Create a table in Amazon Athena

You can either choose Create table or copy this statement and execute it in the Athena console, if you need to make modifications to the query.

Your CREATE TABLE statement should look similar to the following, which has a modified table name (LINE 1) and table comment (LINE 46):

CREATE EXTERNAL TABLE cloudtrail_management_logs (
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    readOnly STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcEndpointId STRING
)
COMMENT 'CloudTrail table for cloudtrail-management-logs'
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://cloudtrail-awslogs-123456789012/AWSLogs/123456789012/CloudTrail/'
TBLPROPERTIES ('classification'='cloudtrail');

Step 2: Create a simple query of CloudTrail data

Use the following query to ensure that the Athena table was created correctly. Replace [your_cloudtrail_table] (LINE 2) with the chosen table name from the previous step. In the example above, the table name was cloudtrail_management_logs.

SELECT *
FROM "[your_cloudtrail_table]"

WHERE eventsource = 'config.amazonaws.com'
  AND eventname = 'PutEvaluations' 

LIMIT 10;

If the table was correctly created, you will see 10 results. This is the beginning of the full query, which you will expand on later in the post. If you read through the query, you can see it’s filtering on two attribute values: config.amazonaws.com (events that originated from AWS Config) and PutEvaluations (events that are recording a compliance state of a resource).

Notice two columns retrieved from the query: requestparameters and additionaleventdata. We will extract data from these JSON objects in the next steps. For a detailed explanation of the properties in these objects, please reference the PutEvaluations API and ConfigRule API documentation respectively.

Step 3: Query the number of AWS Config rule evaluations by rule name

Use the following query to retrieve the number of AWS Config rule evaluations, sorted by highest to lowest number of evaluations, grouped by rule name.

Replace the table name placeholder (LINE 6) with the name of your Athena table.

WITH dataset AS (
  SELECT 
     json_extract(additionaleventdata,'$.configRuleName') AS rule_name, 
     CAST(json_extract(requestparameters, '$.evaluations') AS ARRAY(MAP(VARCHAR, VARCHAR))) AS evaluations_array

  FROM [your_cloudtrail_table]
  WHERE eventsource = 'config.amazonaws.com'
    AND eventname = 'PutEvaluations'
    LIMIT 500
),
expanded_dataset as (
  SELECT 
    rule_name,
    e['complianceType'] as compliance_type
 
  FROM dataset, UNNEST(evaluations_array) AS t(e)
  
  WHERE e['complianceType'] != 'NOT_APPLICABLE'
)
SELECT 
  rule_name,
  count(compliance_type) as evaluations
FROM expanded_dataset
GROUP BY rule_name
ORDER BY evaluations DESC

If successfully executed, you will receive results similar to the following:


#   rule_name				evaluations
1	"no-public-s3-buckets"	305
2	"valid-sg-only"			132
3	"rds-in-private-vpc"	107
4	"encrypted-glue"		106
5	"modern-rds-versions"	100
6	"ec2-with-cost-tags"	99

Let’s break down the query:

LINE 3: Uses the json_extract function to extract configRuleName from additionaleventdata using a JSONPath expression ('$.configRuleName').

LINE 4: Similar to LINE 3, extracts evaluations from requestparameters using json_extract and casts it as an ARRAY<MAP>.
LINE 16: Uses the UNNEST operator to expand the evaluations array, extracted in LINE 4, into individual elements.

LINE 18: Filters out evaluations that are NOT_APPLICABLE, because they do not contribute to costs.

LINE 22: Counts the number of evaluations of each rule.

The query also contains a LIMIT clause (LINE 9) because, by default, Athena will query all CloudTrail log files in the specified S3 bucket. Depending on how large the bucket is, it might take a long time to execute this query. Feel free to experiment by removing the LIMIT clause to see if the query times are acceptable to you. (It still might take more than one minute to execute the query.)

It is fine to scan all files in a S3 bucket for experimenting with Athena queries. However, for long-term use, it’s more performant and cost-effective to partition your data. When you use Amazon Athena, you are charged by the amount of data that you scan. For more information, see the Amazon Athena pricing page.

Steps 4-6 will describe how to create and query partitions. If you don’t want to do this, you can skip ahead to Step 7.

Step 4: Create a new table with partition keys

Partitioning Athena tables means creating logical groups that restrict the amount of data scanned by each query. A common best practice is to partition by date or time. If you look at the CREATE TABLE statement in Step 1 of this post, the LOCATION clause looks like the following:

LOCATION 's3://cloudtrail-awslogs-123456789012/AWSLogs/123456789012/CloudTrail/'

If you browse the objects in that S3 bucket, you’ll see they are organized into AWS Regions, years, months, and days. They are already physically partitioned. See Figure 3. The Regions in your account might be different.

However, because these physical partition locations are not registered in Athena, by default every query will go through all folders.

CloudTrail S3 bucket organized into folders by account, region, year, month and day

Figure 3: CloudTrail S3 bucket Region folders

To partition your table, you will first need to define partition keys using an additional clause in the create table statement. Use the same CREATE TABLE statement from Step 1, but add a PARTITIONED BY clause as seen in the example below (LINE 47). You’ll see that I also added a _partitioned suffix to the table name to differentiate it from the unpartitioned table (LINE 1).

CREATE EXTERNAL TABLE cloudtrail_management_logs_partitioned (
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    readOnly STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcEndpointId STRING
)
COMMENT 'CloudTrail table for cloudtrail-mangement-logs'
PARTITIONED BY (region string, year string, month string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://cloudtrail-awslogs-123456789012/AWSLogs/123456789012/CloudTrail/'
TBLPROPERTIES ('classification'='cloudtrail');

LINE 47: There are three partitions: region, year, and month. You can create more partition keys (for example, account ID and day) as appropriate for your use case.

Note: Where the PARTITIONED BY clause is entered (after COMMENT), an incorrect clause order might cause a syntax error.

Once you’ve entered your partition keys, you can run the statement within the Athena console to create your partitioned table.

Step 5: Register partition locations

After creating the partitioned Athena table, you can register each partition and its respective S3 bucket location, using an ALTER TABLE statement.

The following example statement will register a partition. Replace the placeholders with your values.

ALTER TABLE cloudtrail_management_logs_partitioned ADD 
   PARTITION (region='us-east-1',
              year='2021',
              month='01')
   LOCATION 's3://[bucket-name]/AWSLogs/[account-id]/CloudTrail/us-east-1/2021/01/'

In this statement, I logically registered 'us-east-1', '2021', '01' to a physical S3 location. Using this statement as a template, you can register as many partitions as you need.

Step 6: Query using partitions

Now you can modify the query in Step 3, updating the WHERE clause to limit the search to specific partitions. Your query will look like the following example. Modify the table name (LINE 6) and partitions (LINES 9 to 11) with your own values.

WITH dataset AS (
  SELECT 
     json_extract(additionaleventdata,'$.configRuleName') AS rule_name, 
     CAST(json_extract(requestparameters, '$.evaluations') AS ARRAY(MAP(VARCHAR, VARCHAR))) AS evaluations_array

  FROM default.cloudtrail_partition_table
  WHERE eventsource = 'config.amazonaws.com'
    AND eventname = 'PutEvaluations'
	AND region='us-east-1'
    AND year = '2021'
    AND month = '01'
),
expanded_dataset as (
  SELECT 
    rule_name,
    e['complianceType'] as compliance_type
 
  FROM dataset, UNNEST(evaluations_array) AS t(e)
  
  WHERE e['complianceType'] != 'NOT_APPLICABLE'
)
SELECT 
  rule_name,
  count(compliance_type) as evaluations
FROM expanded_dataset
GROUP BY rule_name
ORDER BY evaluations DESC

When you execute this query, you’ll see that it runs much faster and scans less data overall.

Step 7: Estimate cost

With the query successfully returning the number of rule evaluations per AWS Config rule, you can expand the SELECT statement to include a cost column:

count(compliance_type)* 0.001 as cost

The 0.001 ($0.001 per evaluation) is based on the lowest pricing tier from current AWS Config pricing. Your total number of evaluations may put you on a different pricing tier. You should update this value accordingly. The entire SELECT statement (including cost) will look like this:

SELECT 
  rule_name,
  count(compliance_type) as evaluations,
  count(compliance_type)* 0.001 as cost
FROM expanded_dataset
GROUP BY rule_name
ORDER BY evaluations DESC

Note: Due to factors such as tiered pricing, these are estimated costs. Also remember that Custom rules are executed as AWS Lambda functions in your account and standard Lambda execution costs also apply. These additional costs are not considered within the query above.

Enhancements and next steps

Here are some ideas to get started:

Visualize the queries using Amazon Quicksight

Using the same query, you can create an Amazon QuickSight dashboard to visualize this data over time.

In this example, I visualized the month-over-month change in rule evaluations, the estimated cost for the last three months, and the number of evaluations per day in the last month.

QuickSight dashboard displays bar graphs for estimated cost for rule evaluations for September, August, and July. There is a graph for evaluations by day.

Figure 4: QuickSight dashboard graphs

Build queries for recorded configuration items

You can build similar queries to determine the number of recorded Configuration Items (and to estimate their cost). Configuration items are stored in a different S3 bucket. For more information, see the Identifying resources with the most configuration changes using AWS Config blog post.

In this example, I visualized the number of recorded Configuration Items, grouped by resource type, in the last month.

QuickSight dashboard pie chart shows 1,104 configuration items were recorded in the last month. The configuration items are represented in the chart by AWS resource type

Figure 5: QuickSight dashboard pie chart

Automate partition creation

In this post, you registered partitions manually (Step 5). You can automate the registration of partitions by creating a custom Lambda function and scheduling it to execute the ALTER TABLE statement every month.

Conclusion

In this blog post, I’ve demonstrated how you can use Amazon Athena to query AWS CloudTrail logs and obtain a count of AWS Config rule evaluations by rule name. Now go ahead and experiment! Athena is a powerful tool that can help you gather all sorts of insights on your data without the need to transform it.

AWS Cloud Operations & Migrations Blog