AWS Big Data Blog

Analyze Security, Compliance, and Operational Activity Using AWS CloudTrail and Amazon Athena

by Sai Sriparasa and Bob O'Dell | on | | Comments

As organizations move their workloads to the cloud, audit logs provide a wealth of information on the operations, governance, and security of assets and resources. As the complexity of the workloads increases, so does the volume of audit logs being generated. It becomes increasingly difficult for organizations to analyze and understand what is happening in their accounts without a significant investment of time and resources.

AWS CloudTrail and Amazon Athena help make it easier by combining the detailed CloudTrail log files with the power of the Athena SQL engine to easily find, analyze, and respond to changes and activities in an AWS account.

AWS CloudTrail records API calls and account activities and publishes the log files to Amazon S3. Account activity is tracked as an event in the CloudTrail log file. Each event carries information such as who performed the action, when the action was done, which resources were impacted, and many more details. Multiple events are stitched together and structured in a JSON format within the CloudTrail log files.

Amazon Athena uses Apache Hive’s data definition language (DDL) to create tables and Presto, a distributed SQL engine, to run queries. Apache Hive does not natively support files in JSON, so we’ll have to use a SerDe to help Hive understand how the records should be processed. A SerDe interface is a combination of a serializer and deserializer. A deserializer helps take data and convert it into a Java object while the serializer helps convert the Java object into a usable representation.

In this blog post, we will walk through how to set up and use the recently released Amazon Athena CloudTrail SerDe to query CloudTrail log files for EC2 security group modifications, console sign-in activity, and operational account activity. This post assumes that customers already have AWS CloudTrail configured. For more information about configuring CloudTrail, see Getting Started with AWS CloudTrail in the AWS CloudTrail User Guide.

Setting up Amazon Athena

Let’s start by signing in to the Amazon Athena console and performing the following steps.


Create a table in the default sampledb database using the CloudTrail SerDe. The easiest way to create the table is to copy and paste the following query into the Athena query editor, modify the LOCATION value, and then run the query.


LOCATION 's3://<Your CloudTrail s3 bucket>/AWSLogs/<optional – AWS_Account_ID>/'

with the S3 bucket where your CloudTrail log files are delivered. For example, if your CloudTrail S3 bucket is named “aws -sai-sriparasa” and you set up a log file prefix of  “/datalake/cloudtrail/” you would edit the LOCATION statement as follows:

LOCATION 's3://aws-sai-sriparasa/datalake/cloudtrail/'

CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
userIdentity STRUCT<
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
resources ARRAY<STRUCT<
eventType STRING,
apiVersion STRING,
readOnly STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING
LOCATION 's3://<Your CloudTrail s3 bucket>/AWSLogs/<optional – AWS_Account_ID>/';

After the query has been executed, a new table named cloudtrail_logs will be added to Athena with the following table properties.


Athena charges you by the amount of data scanned per query.  You can save on costs and get better performance when querying CloudTrail log files by partitioning the data to the time ranges you are interested in.  For more information on pricing, see Athena pricing.  To better understand how to partition data for use in Athena, see Analyzing Data in S3 using Amazon Athena.

Popular use cases

These use cases focus on:

  • Amazon EC2 security group modifications
  • Console Sign-in activity
  • Operational account activity

EC2 security group modifications

When reviewing an operational issue or security incident for an EC2 instance, the ability to see any associated security group change is a vital part of the analysis.

For example, if an EC2 instance triggers a CloudWatch metric alarm for high CPU utilization, we can first look to see if there have been any security group changes (the addition of new security groups or the addition of ingress rules to an existing security group) that potentially create more traffic or load on the instance. To start the investigation, we need to look in the EC2 console for the network interface ID and security groups of the impacted EC2 instance. Here is an example:

Network interface ID = eni-6c5ca5a8

Security group(s) = sg-5887f224, sg-e214609e

The following query can help us dive deep into the security group analysis. We’ll configure the query to filter for our network interface ID, security groups, and a time range starting 12 hours before the alarm occurred so we’re aware of recent changes. (CloudTrail log files use the ISO 8601 data elements and interchange format for date and time representation.)

Identify any security group changes for our EC2 instance:

select eventname, useridentity.username, sourceIPAddress, eventtime, requestparameters from cloudtrail_logs
where (requestparameters like '%sg-5887f224%' or requestparameters like '%sg-e214609e%' or requestparameters like '%eni-6c5ca5a8%')
and eventtime > '2017-02-15T00:00:00Z'
order by eventtime asc;

This query returned the following results:

eventname username sourceIPAddress eventtime requestparameters
DescribeInstances 2017-02-15T00:57:23Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-5887f224″}]}}]}}
DescribeInstances 2017-02-15T00:57:24Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-e214609e”}]}}]}}
DescribeInstances 2017-02-15T17:06:01Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-e214609e”}]}}]}}
DescribeInstances 2017-02-15T17:06:01Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-5887f224″}]}}]}}
DescribeSecurityGroups 2017-02-15T23:28:20Z {“securityGroupSet”:{},”securityGroupIdSet”:{“items”:[{“groupId”:”sg-e214609e”}]},”filterSet”:{}}
DescribeInstances 2017-02-16T11:25:23Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-e214609e”}]}}]}}
DescribeInstances 2017-02-16T11:25:23Z {“instancesSet”:{},”filterSet”:{“items”:[{“name”:””,”valueSet”:{“items”:[{“value”:”sg-5887f224″}]}}]}}
ModifyNetworkInterfaceAttribute bobodell 2017-02-16T19:09:55Z {“networkInterfaceId”:”eni-6c5ca5a8″,”groupSet”:{“items”:[{“groupId”:”sg-e214609e”},{“groupId”:”sg-5887f224″}]}}
AuthorizeSecurityGroupIngress bobodell 2017-02-16T19:42:02Z {“groupId”:”sg-5887f224″,”ipPermissions”:{“items”:[{“ipProtocol”:”tcp”,”fromPort”:143,”toPort”:143,”groups”:{},”ipRanges”:{“items”:[{“cidrIp”:”″}]},”ipv6Ranges”:{},”prefixListIds”:{}},{“ipProtocol”:”tcp”,”fromPort”:143,”toPort”:143,”groups”:{},”ipRanges”:{},”ipv6Ranges”:{“items”:[{“cidrIpv6″:”::/0″}]},”prefixListIds”:{}}]}}

The results show that the ModifyNetworkInterfaceAttribute and AuthorizedSecurityGroupIngress API calls may have impacted the EC2 instance. The first call was initiated by user bobodell and set two security groups to the EC2 instance. The second call, also initiated by user bobodell,  was made approximately 33 minutes later, and successfully opened TCP port 143 (IMAP) up to the world (cidrip:

Although these changes may have been authorized, these details can be used to piece together a timeline of activity leading up to the alarm.

Console Sign-in activity

Whether it’s to help meet a compliance standard such as PCI, adhering to a best practice security framework such as NIST, or just wanting to better understand who is accessing your assets, auditing your login activity is vital.

The following query can help identify the AWS Management Console logins that occurred over a 24-hour period. It returns details such as user name, IP address, time of day, whether the login was from a mobile console version, and whether multi-factor authentication was used.

select useridentity.username, sourceipaddress, eventtime, additionaleventdata
from default.cloudtrail_logs
where eventname = 'ConsoleLogin'
and eventtime >= '2017-02-17T00:00:00Z'
and eventtime < '2017-02-18T00:00:00Z';

Because potentially hundreds of logins occur every day, it’s important to identify those that seem to be outside the normal course of business. The following query returns logins that occurred outside our network (, those that occurred using a mobile console version, and those that occurred between midnight and 5:00 A.M.

select useridentity.username, sourceipaddress, json_extract_scalar(additionaleventdata, '$.MobileVersion') as MobileVersion, eventtime, additionaleventdata
from default.cloudtrail_logs 
where eventname = 'ConsoleLogin' 
and (json_extract_scalar(additionaleventdata, '$.MobileVersion') = 'Yes' 
or sourceipaddress not like '72.21.%' 
and eventtime >= '2017-02-17T00:00:00Z'
and eventtime < '2017-02-17T05:00:00Z');

Operational account activity

An important part of running workloads in AWS is understanding recurring errors, how administrators and employees are interacting with your workloads, and who or what is using root privileges in your account.

AWS event errors

Recurring error messages can be a sign of an incorrectly configured policy, the wrong permissions applied to an application, or an unknown change in your workloads. The following query shows the top 10 errors that have occurred from the start of the year.

select count (*) as TotalEvents, eventname, errorcode, errormessage 
from cloudtrail_logs
where errorcode is not null
and eventtime >= '2017-01-01T00:00:00Z' 
group by eventname, errorcode, errormessage
order by TotalEvents desc
limit 10;

The results show:

TotalEvents eventname errorcode errormessage
1098 DescribeAlarms ValidationException 1 validation error detected: Value ‘INVALID_FOR_SUMMARY’ at ‘stateValue’ failed to satisfy constraint: Member must satisfy enum value set: [INSUFFICIENT_DATA, ALARM, OK]
182 GetBucketPolicy NoSuchBucketPolicy The bucket policy does not exist
179 HeadBucket AccessDenied Access Denied
48 GetAccountPasswordPolicy NoSuchEntityException The Password Policy with domain name 341277845616 cannot be found.
36 GetBucketTagging NoSuchTagSet The TagSet does not exist
36 GetBucketReplication ReplicationConfigurationNotFoundError The replication configuration was not found
36 GetBucketWebsite NoSuchWebsiteConfiguration The specified bucket does not have a website configuration
32 DescribeNetworkInterfaces Client.RequestLimitExceeded Request limit exceeded.
30 GetBucketCors NoSuchCORSConfiguration The CORS configuration does not exist
30 GetBucketLifecycle NoSuchLifecycleConfiguration The lifecycle configuration does not exist

These errors might indicate an incorrectly configured CloudWatch alarm or S3 bucket policy.

Top IAM users

The following query shows the top IAM users and activities by eventname from the beginning of the year.

select count (*) as TotalEvents, useridentity.username, eventname
from cloudtrail_logs
where eventtime >= '2017-01-01T00:00:00Z' 
and useridentity.type = 'IAMUser'
group by useridentity.username, eventname
order by TotalEvents desc;

The results will show the total activities initiated by each IAM user and the eventname for those activities.

Like the Console sign-in activity query in the previous section, this query could be modified to filter the activity to view only events that occurred outside of the known network or after hours.

Root activity

Another useful query is to understand how the root account and credentials are being used and which activities are being performed by root.

The following query will look at the top events initiated by root from the beginning of the year. It will show whether these were direct root activities or whether they were invoked by an AWS service (and, if so, which one) to perform an activity.

select count (*) as TotalEvents, eventname, useridentity.invokedby
from cloudtrail_logs
where eventtime >= '2017-01-01T00:00:00Z' 
and useridentity.type = 'Root'
group by useridentity.username, eventname, useridentity.invokedby
order by TotalEvents desc;


 AWS CloudTrail and Amazon Athena are a powerful combination that can help organizations better understand the operations, governance, and security of assets and resources in their AWS accounts without a significant investment of time and resources.

About the Authors


Sai_Author_pic_resizeSai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.




BobO_Author_pic5_resizeBob O’Dell is a Sr. Product Manager for AWS CloudTrail. AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of AWS accounts.  Bob enjoys working with customers to understand how CloudTrail can meet their needs and continue to be an integral part of their solutions going forward.  In his spare time, he enjoys spending time with HRB exploring the new world of yoga and adventuring through the Pacific Northwest.



Analyzing Data in S3 using Amazon Athena