AWS Cloud Operations & Migrations Blog

How to query your AWS resource configuration states using AWS Config and Amazon Athena

This blog was updated by Chris Chiott, Solutions Architect WWPS, on Nov 4, 2022.

Tracking and managing the states of your AWS resources can be a challenge, especially as your account grows and you integrate with more and more AWS services. AWS Config is a service that helps make tracking your resources easy by continuously monitoring and recording your AWS resource configurations and maintaining a history of configuration changes to those resources. AWS Config also delivers a periodic snapshot of the latest resource configurations to your Amazon S3 bucket.

Customers often seek to extract intelligence and run queries on the data stored by AWS Config. This allows cloud administrators to, for example, identify Amazon EC2 instances of a particular size or AMI ID, or EC2 instances that host a specific version of software. This type of intelligence data can easily be extracted using Amazon Athena, which is an interactive query service that allows you to analyze data within Amazon Simple Storage Service (Amazon S3) using standard SQL.

In this blog post, I’ll walk you through how you can combine the vast knowledge AWS Config has of your AWS resources with the powerful query capabilities of Amazon Athena to perform advanced searches of your resources. I’ll also guide you through a few example queries to help get you started.

Sign up for AWS Config and Amazon Athena

If you have not already done so, you should first sign up for AWS Config and Amazon Athena. For instructions, follow the AWS Config Getting Started and Amazon Athena Setting Up pages.

Enable Amazon S3 configuration snapshot delivery in AWS Config

The first step in setting up Amazon Athena so it can query your AWS Config data is to export your AWS Config data to Amazon S3. AWS Config provides this capability through the configuration snapshot delivery feature. You can use this feature to configure AWS Config to regularly deliver a JSON file to Amazon S3 containing the configurations of all your AWS resources recorded by Config. We’ll begin by turning this feature on and configuring it to export to Amazon S3 once a day.

To make AWS Config begin exporting to Amazon S3, your AWS Config delivery channel needs to be updated to include a delivery frequency for property “configSnapshotDeliveryProperties”. Add this property to your delivery channel using the PutDeliveryChannel API action. To begin with, retrieve the existing state of your delivery channel using the DescribeDeliveryChannels API action. This can be done as follows using the AWS CLI:

$ aws configservice describe-delivery-channels

The API returns a value for your delivery channel like the following:

{
    "s3BucketName": "marc-config-bucket"
    "snsTopicARN": "arn:aws:sns:us-east-1:123456789012:config-sns-topic",
    "name": "default"
}

Save this model to a file named “delivery-channel.json” and then update the model to include property “configSnapshotDeliveryProperties” with a “deliveryFrequency” of “TwentyFour_Hours”. Also specify property “s3BucketName” with the name of the S3 bucket to which you want AWS Config to export your data.


{
    "configSnapshotDeliveryProperties": {
        "deliveryFrequency": "TwentyFour_Hours"
    },
    "s3BucketName": "marc-config-bucket",
    "snsTopicARN": "arn:aws:sns:us-east-1:123456789012:config-sns-topic",
    "name": "default"
}

Save the file and then call the PutDeliveryChannel API action with the updated model:

$ aws configservice put-delivery-channel --delivery-channel file://delivery-channel.json

AWS Config will now begin exporting a configuration snapshot to the S3 bucket you specified once every 24 hours. The first export will occur within an hour of updating your delivery channel with a configuration snapshot delivery frequency.

Create an Amazon Athena table for your AWS Config data

Next, we’ll create an Amazon Athena table pointing to the Amazon S3 bucket you just configured AWS Config to export its data to. Creating a table in Amazon Athena is done using the CREATE EXTERNAL TABLE command. When calling this command, we’ll specify table columns that match the format of the AWS Config configuration snapshot files so that Athena knows how to query and parse the data.

To create the table, begin by navigating to the Query Editor in the Amazon Athena console.

Execute the following command in the Query Editor while viewing the “sampledb” database, replacing “<S3_BUCKET_NAME>” with the name of the bucket you specified in your AWS Config Delivery channel.

CREATE EXTERNAL TABLE aws_config_configuration_snapshot (
 fileversion STRING,
 configsnapshotid STRING,
 configurationitems ARRAY < STRUCT <
        configurationItemVersion : STRING,
        configurationItemCaptureTime : STRING,
        configurationStateId : BIGINT,
        awsAccountId : STRING,
        configurationItemStatus : STRING,
        resourceType : STRING,
        resourceId : STRING,
        resourceName : STRING,
        ARN : STRING,
        awsRegion : STRING,
        availabilityZone : STRING,
        configurationStateMd5Hash : STRING,
        configuration : STRING,
        supplementaryConfiguration : MAP <STRING, STRING>,
        tags: MAP <STRING, STRING> ,
        resourceCreationTime : STRING > >
) 
PARTITIONED BY ( accountid STRING, dt STRING, region STRING )
ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'case.insensitive'='false',
  'mapping.fileversion'='fileVersion',
  'mapping.configsnapshotid'='configSnapshotId',
  'mapping.configurationitems'='configurationItems',
  'mapping.configurationitemversion'='configurationItemVersion',
  'mapping.configurationitemcapturetime'='configurationItemCaptureTime',
  'mapping.configurationstateid'='configurationStateId',
  'mapping.awsaccountid'='awsAccountId',
  'mapping.configurationitemstatus'='configurationItemStatus',
  'mapping.resourcetype'='resourceType',
  'mapping.resourceid'='resourceId',
  'mapping.resourcename'='resourceName',
  'mapping.arn'='ARN',
  'mapping.awsregion'='awsRegion',
  'mapping.availabilityzone'='availabilityZone',
  'mapping.configurationstatemd5hash'='configurationStateMd5Hash',
  'mapping.supplementaryconfiguration'='supplementaryConfiguration',
  'mapping.configurationstateid'='configurationStateId'
  )
LOCATION 's3://awsexamplebucket/AWSLogs/';

This command can be broken down as follows:

  • line 1 – line 21: Define table aws_config_configuration_snapshot, specifying table columns that match the JSON format of the AWS Config configuration snapshot files.
  • line 22: Define partition keys “region” and “dt”. AWS Config S3 object keys are organized by Region and date. In the next section, we’ll configure partitions on your table to refer to the respective Amazon S3 object keys.
  • line 23 – line 25: The configuration snapshot files are in JSON format. This line specifies the JSON SerDe which will be used for parsing the data.
  • line 26: It allows for names with duplicate tags that may be uppercase and lowercase to be case insensitive.
  • line 27 – line 44: Replaces all mapping properties with your column names and JSON keys.
  • line 45: Point the table to the Amazon S3 bucket to which AWS Config is exporting its data.

After running this query, you’ll have a new Athena table named aws_config_configuration_snapshot.

Setup table partitioning

We’ll next partition your aws_config_configuration_snapshot table based on the Region and date of the configuration snapshot files delivered to your AWS Config S3 bucket. With the enablement of configuration snapshot delivery in the first section, AWS Config is uploading a configuration snapshot to Amazon S3 once a day. The Amazon S3 object keys of these configuration snapshot file have the following format:

/AWSLogs/<ACCOUNT_ID>/Config/<REGION>/<YEAR>/<MONTH>/<DAY>/ConfigSnapshot/<CONFIGURATION_SNAPSHOT_FILE>.json.gz

For example:

/AWSLogs/123456789012/Config/us-east-1/2018/2/28/ConfigSnapshot/123456789012_Config_us-east-1_ConfigSnapshot_20180326T163348Z_da3e6f89-1480-450f-bd5c-4550997386g6.json.gz

When we created the table in Athena in the previous section, we included partition keys “region” and “dt”. We’ll configure these partition keys with values corresponding to the Region and date of each configuration snapshot file. By partitioning the Athena table in this way, it will allow us to easily write queries which target configuration snapshot files from specific Regions and dates.

To partition each of the configuration snapshots delivered to Amazon S3, we’ll create an AWS Lambda function that is triggered by the addition of new configuration snapshots to the AWS Config Amazon S3 bucket. Each time AWS Config delivers a configuration snapshot to Amazon S3, the Lambda function will be triggered and will add a partition value for key “region” and “dt” based on the Region and date of the given configuration snapshot file. For example, the Lambda function will add a partition with Region=”us-east-1″ and dt=”2018-02-28″ for configuration snapshot file “/AWSLogs/123456789012/Config/us-east-1/2018/2/28/ConfigSnapshot/123456789012_Config_us-east-1_ConfigSnapshot_20180326T163348Z_da3e6f89-1480-450f-bd5c-4550997386g6.json.gz”. In addition, a special partition with dt=”latest” will be added. This partition is kept up-to-date with whatever is the most recent configuration snapshot in each Region. Each day, when a new configuration snapshot is delivered to S3, partition key dt=”latest” will be updated to point to this latest file.

To create the Lambda function, begin by navigating to the Lambda console and choosing the Create function button. Choose the option to Author from scratch and then fill in “aws_config_bucket_events” as the function name, “Python 3.6” as the runtime, and select the option to “Create new role from template(s)”, assigning a role name of “aws_config_bucket_events_role”.

Then, choose Create function. Lambda will then create a new function along with a role for executing the function.

Now, go ahead and paste the following code into the function editor. This code handles receiving the PutObject requests from Amazon S3 and setting the appropriate partitions on the Athena table.

import datetime
import re
import boto3
import os

TABLE_NAME = 'aws_config_configuration_snapshot'
DATABASE_NAME = 'sampledb'
ACCOUNT_ID = None # Determined at runtime
LATEST_PARTITION_VALUE = 'latest'

athena = boto3.client('athena')

def lambda_handler(event, context):
    global ACCOUNT_ID

    object_key = event['Records'][0]['s3']['object']['key']
    match = get_configuration_snapshot_object_key_match(object_key)
    if match is None:
        print('Ignoring event for non-configuration snapshot object key', object_key)
        return
    print('Adding partitions for configuration snapshot object key', object_key)
    
    ACCOUNT_ID = context.invoked_function_arn.split(':')[4]
    object_key_parent = 's3://{bucket_name}/{object_key_parent}/'.format(
        bucket_name=event['Records'][0]['s3']['bucket']['name'],
        object_key_parent=os.path.dirname(object_key))
    configuration_snapshot_accountid = get_configuration_snapshot_accountid(match)
    configuration_snapshot_region = get_configuration_snapshot_region(match)
    configuration_snapshot_date = get_configuration_snapshot_date(match)
    
    drop_partition(configuration_snapshot_accountid, configuration_snapshot_region, LATEST_PARTITION_VALUE)
    add_partition(configuration_snapshot_accountid, configuration_snapshot_region, LATEST_PARTITION_VALUE, object_key_parent)
    add_partition(configuration_snapshot_accountid, configuration_snapshot_region, get_configuration_snapshot_date(match).strftime('%Y-%m-%d'), object_key_parent)
    
def get_configuration_snapshot_object_key_match(object_key):
    # Matches object keys like AWSLogs/123456789012/Config/us-east-1/2018/4/11/ConfigSnapshot/123456789012_Config_us-east-1_ConfigSnapshot_20180411T054711Z_a970aeff-cb3d-4c4e-806b-88fa14702hdb.json.gz
    return re.match('AWSLogs/(\d+)/Config/([\w-]+)/(\d+)/(\d+)/(\d+)/ConfigSnapshot/[^\\\]+$', object_key)

def get_configuration_snapshot_accountid(match):
    print('AccountId:', match.group(1))
    return match.group(1)

def get_configuration_snapshot_region(match):
    return match.group(2)

def get_configuration_snapshot_date(match):
    return datetime.date(int(match.group(3)), int(match.group(4)), int(match.group(5)))
    
def add_partition(accountid_partition_value, region_partition_value, dt_partition_value, partition_location):
    execute_query('ALTER TABLE {table_name} ADD PARTITION {partition} location \'{partition_location}\''.format(
        table_name=TABLE_NAME,
        partition=build_partition_string(accountid_partition_value, region_partition_value, dt_partition_value),
        partition_location=partition_location))
        
def drop_partition(accountid_partition_value, region_partition_value, dt_partition_value):
    execute_query('ALTER TABLE {table_name} DROP PARTITION {partition}'.format(
        table_name=TABLE_NAME,
        partition=build_partition_string(accountid_partition_value, region_partition_value, dt_partition_value)))
        
def build_partition_string(accountid_partition_value, region_partition_value, dt_partition_value):
    return "(accountid='{accountid_partition_value}', dt='{dt_partition_value}', region='{region_partition_value}')".format(
	    accountid_partition_value=accountid_partition_value,
        dt_partition_value=dt_partition_value,
        region_partition_value=region_partition_value)

def execute_query(query):
    print('Executing query:', query)
    query_output_location = 's3://aws-athena-query-results-{account_id}-{region}'.format(
        account_id=ACCOUNT_ID,
        region=os.environ['AWS_REGION'])
    start_query_response = athena.start_query_execution(
        QueryString=query,
        QueryExecutionContext={
            'Database': DATABASE_NAME
        },
        ResultConfiguration={
            'OutputLocation': query_output_location,
        }
    )
    print('Query started')
    
    is_query_running = True
    while is_query_running:
        get_query_execution_response = athena.get_query_execution(
            QueryExecutionId=start_query_response['QueryExecutionId']
        )
        query_state = get_query_execution_response['QueryExecution']['Status']['State']
        is_query_running = query_state in ('RUNNING','QUEUED')
        
        if not is_query_running and query_state != 'SUCCEEDED':
            raise Exception('Query failed')
    print('Query completed')

A number of settings need to be updated to make the function work properly. First, configure the function to trigger when new objects are added to the AWS Config bucket by expanding the Designer drop-down list and then selecting “S3” as a trigger. Configure the trigger, selecting your AWS Config bucket as the trigger bucket, and leave the default trigger event type of “Object Created (All)”. Both the Prefix and Filter pattern sections can be left blank. Then, choose Add to add the trigger, and finally choose Save to save your changes.

You should also increase the function timeout to 1 minute to ensure that the function always has time to finish setting up all partitions.

Finally, navigate to the IAM console and open role “aws_config_bucket_events_role”. Attach policy “AmazonAthenaFullAccess” in order to grant the permissions necessary for calling Amazon Athena APIs.

Simple example query

It might take up to an hour for your first configuration snapshot to be delivered to Amazon S3. Alternatively, you could run this statement in the CLI to create an immediate configuration snapshot.

aws configservice deliver-config-snapshot --delivery-channel-name channel_name --region region_name 

After the first delivery has occurred, we’re ready to perform queries on your AWS resources in Amazon Athena. Let’s start with a simple example. Run the following command in the Athena Query Editor to list all EC2 instances in your account which are of type “t2.micro”:

SELECT configurationItem.resourceType, configurationItem.resourceId as InstanceId
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::EC2::Instance'
AND json_extract_scalar(configurationItem.configuration, '$.instancetype') = 't2.micro';

Here’s the break-down of this command:

  • line 1: Specify the properties you want to return in the query results. Here, we’re returning the resource type and resource ID for each EC2 instance found.
  • line 3: The configuration snapshot files exported by AWS Config store configuration items inside property “configurationItems”, which is an array. CROSS JOIN UNNEST is used to flatten this array such that each individual configuration item within the array can easily be queried. Specifying “t(configurationItem)” assigns name “configurationItem” to each of the entries within the array. “t” refers to the name we assign to the row and can be ignored.
  • line 4: The “dt” partition key with value “latest” is specified in order to scope the query down to only look at the latest configuration snapshot file in your S3 bucket. As mentioned in the previous section, the Lambda function is automatically keeping partition dt=”latest” to point to the latest configuration snapshot file for a Region. You can always specify dt=”latest” in your queries if you are interested in querying the latest configuration states for your resources. You can also specify a value like dt=”2018-02-28″ in order to search your configuration snapshot data from a specific date.
  • line 5: Only look at configuration items for EC2 instances.
  • line 6: json_extract_scalar() is used to evaluate the JSON value of the “configuration” property. The “configuration” property of the configuration item for a resource is a JSON object of that resource’s configuration. However, if you look back to the SerDes used in the CREATE EXTERNAL TABLE command, “configuration” is defined as a string. The reason for this is that the model of the “configuration” property differs for each resource type recorded by AWS Config. So, there is no single SerDes that can be defined for this property which applies to all configuration items. Therefore, json_extract_scalar() is used to evaluate the JSON despite not having included the model for the configuration JSON in the Athena table SerDes. In this function call, “$.instancetype” is specified as the JSONPath expression in order to refer to the instanceType property of EC2 instance’s configuration. Note that this path must be lower-case due to the fact that column names used by the function are lower cased.

The above command returns results such as the following:

ResourceType InstanceId
AWS::EC2::Instance i-12345678912
AWS::EC2::Instance i-12345678913
AWS::EC2::Instance i-12345678914

Another example

The following query acts on a number of additional configuration item properties and uses additional Presto functions, performing a lookup for all Amazon DynamoDB tables containing an owner tag of TeamA or TeamB that have an update stream enabled and are keyed based on more than property.

SELECT configurationItem.resourceType, configurationItem.resourceId as TableName
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::DynamoDB::Table'
AND (configurationItem.tags['owner']='TeamA' OR configurationItem.tags['owner']='TeamB')
AND json_extract_scalar(configurationItem.configuration, '$.streamspecification.streamenabled') = 'true'
AND json_array_length(json_extract(configurationItem.configuration, '$.keyschema')) > 1
ResourceType TableName
AWS::DynamoDB::Table SubscriptionsTable
AWS::DynamoDB::Table UserAccounts
AWS::DynamoDB::Table CustomerResources

Custom resource views

The previous sample queries only return resource identifier information in the query results. However, you can also structure your queries to return customized views of your AWS resources. The following query returns a list of all Amazon S3 buckets in your account, alongside a view of each bucket’s setting for Requester Pays, Bucket Acceleration, and Versioning.

SELECT configurationItem.resourceId as BucketName,
if(configurationItem.supplementaryConfiguration['isrequesterpaysenabled'] = 'true', 'Enabled', 'Off') as RequesterPaysStatus,
coalesce(json_extract_scalar(configurationItem.supplementaryConfiguration['bucketaccelerateconfiguration'], '$.status'), 'Off') as BucketAccelerationStatus,
json_extract_scalar(configurationItem.supplementaryConfiguration['bucketversioningconfiguration'], '$.status') as BucketVersioningStatus
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::S3::Bucket'

Results for this query appear as follows:

BucketName RequesterPaysStatus BucketAccelerationStatus BucketVersioningStatus
marc-bucket-1 Enabled Enabled Enabled
marc-bucket-2 Off Enabled Suspended
marc-bucket-3 Enabled Off Off

Querying data from multiple Regions

This blog post thus-far has provided instructions as if you only use AWS Config in a single AWS Region. However, if you use AWS Config in more than one Region, the aws_config_configuration_snapshot Athena table can seamlessly be made to support queries on these additional Regions by simply setting them all up to export configuration snapshot data to the same Amazon S3 bucket. Simply repeat the steps in section “Enable Amazon S3 configuration snapshot delivery in AWS Config” in all Regions in which you use AWS Config to set this up. You will then be able to perform incredibly powerful, cross-Region queries against your AWS resources.

By default, queries executed against the aws_config_configuration_snapshot table will search configuration snapshot data for all Regions contained within your S3 bucket. Whether your Amazon S3 bucket contains configuration snapshot data for a single Region or multiple Regions, queries executed against the Athena table are written and behave exactly the same. If, however, you would like to target your queries to a configuration snapshot in a specific Region, you can easily do this by specifying a Region in your queries. Recall from the CREATE EXTERNAL TABLE command that we added a “region” partition key to the Athena table. This partition key is being automatically set by the Lambda function we added earlier. If you execute a query that doesn’t include this partition key, Athena searches data for all Regions. For example:

Retrieve the list of all resources recorded by AWS Config in all Regions

SELECT configurationItem.resourceType, configurationItem.resourceId
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'

Retrieve the list of all resources recorded by AWS Config only in Region us-east-1

SELECT configurationItem.resourceType, configurationItem.resourceId
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND region = 'us-east-1'

Popular use cases

Before wrapping up, here are a few additional sample queries based on popular use cases to help get you started.

List all instances missing a “Name” tag

SELECT configurationItem.resourceId as InstanceId
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::EC2::Instance'
AND configurationItem.tags['name'] IS NULL

List Windows instances running the SSM Agent that are missing Windows Update “KB4035631”

SELECT configurationItem.resourceType, configurationItem.resourceId as InstanceId
FROM default.awsconfig_region
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::SSM::ManagedInstanceInventory'
AND json_extract_scalar(configurationItem.configuration, '$.aws:instanceinformation.content["' || configurationItem.resourceId || '"].platformtype') = 'Windows'
AND json_extract(configurationItem.configuration, '$.aws:windowsupdate.content.kb4035631') IS NULL

Sort your Certificate Manager certificates by their expiration date

SELECT configurationItem.resourceType, configurationItem.resourceId as CertificateArn, json_extract_scalar(configurationItem.configuration, '$.notafter') as CertificateExpiration
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::ACM::Certificate'
ORDER BY CertificateExpiration

List IAM users that are using inline policies

SELECT configurationItem.resourceName as User
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::IAM::User'
AND json_array_length(json_extract(configurationItem.configuration, '$.userpolicylist')) > 0

List your default VPC in each Region

SELECT configurationItem.awsRegion as Region, configurationItem.resourceId as VpcId
FROM sampledb.aws_config_configuration_snapshot
CROSS JOIN UNNEST(configurationitems) AS t(configurationItem)
WHERE dt = 'latest'
AND configurationItem.resourceType = 'AWS::EC2::VPC'
AND json_extract_scalar(configurationItem.configuration, '$.isdefault') = 'true'

Conclusion

Combining the query power of Amazon Athena with the configuration data stored by AWS Config allows for extremely powerful and useful lookups on your AWS resources. I hope this blog post has demonstrated this to you and helps you in managing your AWS resources.

About the Author


Marc Labrecque is an engineer on AWS Config based in Seattle, Washington.