Using CloudTrail to identify unexpected behaviors in individual workloads

In this post, we describe a practical approach that you can use to detect anomalous behaviors within Amazon Web Services (AWS) cloud workloads by using behavioral analysis techniques that can be used to augment existing threat detection solutions. Anomaly detection is an advanced threat detection technique that should be considered when a mature security baseline as described in the security pillar of the AWS Well-Architected framework is in place.

Why you should consider behavior-based detection in the cloud

Traditionally, threat detection solutions focus on the endpoint and the network and analyze log events for known indicators of attack and indicators of compromise. Other forms of threat detection focus on the user and data using products such as data loss prevention and user and endpoint behavior analytics to detect suspicious user behavior at the data layer. Both solution types analyze operating system, application level, and network logs and focus on the detection of known tactics, techniques, and procedures, but the cloud control plane and other cloud native log sources are outside the use case of traditional threat detection solutions

Being able to detect malicious behavior in your environment is necessary to stay secure in the cloud. This includes the detection of events when cloud services might have been misused. The challenge is that related activities are logged on a control plane level and don’t leave any traces in log sources that are traditionally analyzed for threat detection. For example, unwanted data movements between cloud services or cloud accounts use the cloud backplane for data transfers and don’t necessarily touch any endpoint or network gateway. Therefore, related events only appear within cloud native logs such as AWS CloudTrail or AWS Config and not in network or operating system logs.

Figure 1: Solution architecture example

In the simplified example shown in Figure 1, only data streams that pass from the cloud to the firewall and then to AWS services are visible to the endpoint (an Amazon EC2 instance) or the gateway security solution.

Data streams that pass through serverless solutions and activities of cloud native services are only visible in cloud native logs.

Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect AWS accounts, and analyzes not only the network flow logs but also the cloud control plane. GuardDuty uses threat intelligence coupled with machine learning and behavior models to detect threats such as account compromise and unusual data access or communications, and should be activated in each cloud account.

But not all unwanted behavior follows known attack patterns. Unwanted behaviors can also include normal activity inside a cloud environment that is different from the intended behavior of a particular workload. Each activity or log entry by itself might not look malicious, but a series of events can reveal possible malicious intent when compared to the individual context of the application. Because there are no bad events as such in CloudTrail like in a firewall or antivirus log, the challenge is to detect threats based on noncompliant behaviors in the context of the application use case and not on known threat vectors.

Anomaly detection is playing an increasingly important role in defense strategies because of the constantly evolving attack and obfuscation techniques that make it hard to detect threats based on known tactics, techniques, and procedures.

What does unwanted behavior look like?

One approach to identifying key events that are related to unwanted behaviors is to identify a set of anomaly-related questions around common cloud activities that consider the workload context. Depending on the workload type, unwanted cloud API events and related questions could look like the following:

Event: An EC2 instance was launched.
Question: Was an unexpected user or role used or was the EC2 instance launched outside the pipeline?

Event: A user or role performs many API list and describe events within a short timeframe.
Questions: Does the application normally generate list and describe API calls in production? If not, this could be reconnaissance activity performed by an intruder.

Event: A user or role creates and shares an Amazon Elastic Block Store (Amazon EBS) snapshot with another account.
Question: Is the snapshot sharing event expected? If not, it could be an attempt to exfiltrate data.

Event: Many failed API calls are detected in CloudTrail.
Question: Are these failed calls around sensitive services or information? If yes, an unauthorized user could be exploring the environment.

Event: Many ListBucket events are detected for a sensitive Amazon Simple Storage Service (Amazon S3) bucket.
Question: Are these events unexpected and performed by an unexpected identity? If yes, an unauthorized user performing an S3 bucket enumeration might indicate a reconnaissance activity.

After a set of questions has been identified, they can be converted into application specific threat detection use cases, which can be applied to sensitive production environments. This is a useful strategy because these environments typically have a predictable usage pattern. The predictable patterns reduce the chance of false positives, making it worth the effort of developing use cases for monitoring anomalies. Threat detection use cases can be identified within CloudTrail logs using security information and event management (SIEM) tools or Amazon CloudWatch rules.

Detecting anomalies in CloudTrail with CloudWatch

Activities within your AWS account can be recorded with CloudTrail, which makes it the ideal service not only for deeper investigations into past cloud activities but also to detect unwanted behaviors in near real time. CloudTrail sends logs to an S3 bucket and can forward events to CloudWatch. Using CloudWatch, you can perform searches across all CloudTrail events and define CloudWatch alarms for automatic notifications.

You can create alerts for individual CloudTrail events that you consider an anomaly by creating CloudWatch filters and alarms. A filter defines the events that you want to monitor and an alarm defines the threshold when you want to be notified.

To create a filter for the preceding S3 bucket enumeration example, you would select the CloudTrail log group, and then select Metric Filters and create a new metric filter, as shown in Figure 2.

Figure 2: Create CloudWatch metric filter

Excluding the userAgent AWS Internal excludes S3 access activities performed by other AWS services such as AWS Access Analyzer or Amazon Macie which can be considered normal behavior.

Save this metric filter in a new name space that you use for all of your anomaly detection monitoring. After you have created the filter, create a new CloudWatch alarm based on your filter. Depending on your filter and alarm thresholds, you will receive CloudWatch alarm notifications through a Amazon Simple Notification Service (Amazon SNS) topic and have the opportunity to automatically launch other actions that can perform incident response activities.

After an alert is raised, you can use the same filter pattern to search for the relevant events in CloudWatch. The CloudTrail events will provide more information about who performed the S3 ListBucket events such as IP address (sourceIPAddress), who performed the action (userIdentity), or if the action was performed through the AWS Management Console or AWS Command Line Interface (AWS CLI) (userAgent = aws-internal or aws-cli). Figure 3 that follows is an example of a CloudTrail log.

Figure 3: CloudTrail example log

Detecting anomalies using traps

Another simple, but effective technique to detect intruders based on unwanted behaviors is to use decoy services such as canaries or honey pots. Honey pots are designed to provide information about the behavior of attackers by providing them fake production environments that they can explore—such as hosts within a subnet or data stores such as databases or storage services with dummy data. Canaries are identities or access tokens within honey pot environments that look like privileged identities. Honey pots and canaries both appear attractive to attackers due to the names that are used for users, databases, or host names, but don’t expose the organization to risk if compromised.

Using CloudWatch alarms, you can monitor CloudTrail for events that indicate that attackers have started to explore the honey pot or tried to laterally move using the canary access token. By acting like an attacker yourself, you can generate test events within CloudTrail that will help you to identify the event details—such as event sources, event names, and parameters—that you want to monitor. Here are some examples of CloudTrail events you can monitor for different kinds of traps.

Trap	Event source	Event name	Example instance or user name
Login attempt using a canary identity	signin.amazonaws.com	ConsoleLogin	Backup_Admin
Assume role attempt using a canary role	sts.amazonaws.com	AssumeRole	DevOps_role
Exploration of a honey pot database	dynamodb.amazonaws.com	ListTable	CustomerAccounts
Exploration of a honey pot storage service	s3.amazonaws.com	GetObject	PasswordBackup

Traps are typically deployed in production environments where access and use patterns are predictable and strictly controlled. They’re a cost effective and easy to implement solution that can provide alarms with a high degree of certainty. Traps also offer a good chance to catch even the most sophisticated threat actors; especially when they use highly automated attacks.

Detecting statistical anomalies

AWS CloudTrail Insights is a feature of CloudTrail that can be used to identify unusual operational activity in your AWS accounts such as spikes in resource provisioning, bursts of AWS Identity and Access Management (IAM) activity, or gaps in periodic maintenance activity.

CloudTrail Insights can provide primary indicators for noncompliant behaviors by establishing a baseline for normal behavior and then generating Insights events when it detects unusual patterns. Primary indicators are events that initiate an investigation.

But even when statistical changes haven’t reached alert thresholds and no issue is raised, statistical insights can be used as a supporting secondary indicator during investigations to better understand the context of an incident. Even minor changes of specific API calls around sensitive data can provide valuable information after an alert from another solution such as GuardDuty, or when the previously described anomaly detection techniques have been raised.

Figure 4 that follows is an example of an Insights chart showing API calls over time.

Figure 4: CloudTrail Insights example chart

Conclusion

In this post I described the importance of monitoring sensitive workloads for noncompliant or unwanted behaviors to complement existing security solutions. Anomaly detection in the cloud monitors cloud service activities on the control plane and checks to see if the behavior is expected in the context of each workload. The effort to set up and support the tools described in this blog post leads to an affordable, practical, and powerful mechanism for the detection of sophisticated threat actors in the cloud. To learn more about how you can analyze API activities in the cloud, see Analyzing AWS CloudTrail in Amazon CloudWatch in the AWS Management & Governance Blog.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.