Use tags to create and maintain Amazon CloudWatch alarms for Amazon EC2 instances (Part 1)

This blog post is the first in a two-part series. I walk you through a solution to automatically create and enforce a standard set of Amazon CloudWatch metric alarms for Amazon Elastic Compute Cloud (Amazon EC2) instances by using Amazon EC2 instance tags.

Creating and configuring a standard set of CloudWatch alarms for a large fleet of EC2 instances can be time consuming and hard to govern. This is especially true in large-scale migrations and multi-account environments where you want to quickly establish a consistent set of alarms for your instances. The solution I describe in this blog post helps you quickly and consistently set up a standard set of CloudWatch alarms for new and running instances and remove the alarms when the instances are terminated.

In part two of this blog post series, I’ll walk you through steps you can take to enforce your standard alarm set with AWS Config rules to ensure that your alarming strategy remains in place.

Prerequisites

An AWS account with permissions to access Amazon EC2, AWS Identity and Access Management (IAM), AWS CloudFormation, Amazon CloudWatch, AWS Lambda, and AWS Systems Manager. If you don’t have permissions, contact your security team.
A VPC created in Amazon Virtual Private Cloud (Amazon VPC) with network connectivity to the CloudWatch service endpoints. If you don’t have a VPC, you can create one with a public subnet using this CloudFormation stack.
The AWS Command Line Interface (AWS CLI) installed with the permissions mentioned in the first prerequisite to perform the deployment.

Configure the CloudWatchAutoAlarms Lambda function

The CloudWatchAutoAlarms Lambda function enables you to create a standard set of alarms for EC2 instances and AWS Lambda functions when you tag them with an identifying tag.

The default configuration creates alarms for any Windows, Amazon Linux, Red Hat, Ubuntu, or SUSE Linux EC2 instance with the following metrics:

CPU Utilization
CPU Credit Balance (for T Class instances)
Disk Space (Amazon CloudWatch agent predefined basic metric)
Memory (Amazon CloudWatch agent predefined basic metric)

You can expand the set of alarms to include standard or custom EC2 metrics, refer to the documentation on GitHub for details. You can also customize a number of settings by updating the Lambda function environment variables defined in the CloudWatchAutoAlarms.yaml CloudFormation template. This template is used to deploy the CloudWatchAutoAlarms AWS Lambda function. The following list provides environment variable names, default values and descriptions of the settings you can update:

ALARM_TAG: Create_Auto_Alarms
- The CloudWatchAutoAlarms Lambda function only creates alarms for instances that are tagged with this tag key. The default tag key name is Create_Auto_Alarms. If you want to use a different name, change the value of the ALARM_TAG environment variable.
CLOUDWATCH_NAMESPACE: CWAgent
- You can change the namespace where the CloudWatchAutoAlarms Lambda function should look for your CloudWatch metrics. The default CloudWatch agent metrics namespace is CWAgent. If your CloudWatch agent configuration is using a different namespace, then update the CLOUDWATCH_NAMESPACE environment variable.
CLOUDWATCH_APPEND_DIMENSIONS: InstanceId, ImageId, InstanceType, AutoScalingGroupName
- You can add EC2 metric dimensions to all metrics collected by the CloudWatch agent. This environment variable aligns to your CloudWatch configuration setting for append_dimensions. The default setting includes all the supported dimensions: InstanceId, ImageId, InstanceType, AutoScalingGroupName
DEFAULT_ALARM_SNS_TOPIC_ARN
- You can define an Amazon Simple Notification Service (Amazon SNS) topic that the Lambda function specifies as the notification target for created alarms. You provide the Amazon SNS Topic Amazon Resource Name (ARN) with the AlarmNotificationARN parameter when you deploy the CloudWatchAutoAlarms.yaml CloudFormation template. If you leave the AlarmNotificationARN parameter value blank, then the created alarms won’t use notifications.
You can update the thresholds for the default alarms by updating the following environment variables:
- ALARM_CPU_HIGH_THRESHOLD: 75
- ALARM_CPU_CREDIT_BALANCE_LOW_THRESHOLD: 100
- ALARM_MEMORY_HIGH_THRESHOLD: 75
- ALARM_DISK_PERCENT_LOW_THRESHOLD: 20

For example, one of the default alarms is AutoAlarm-AWS/EC2-CPUUtilization-GreaterThanThreshold-5m-Average. When an instance with an EC2 instance tag key named Create_Auto_Alarms enters the running state, an alarm is created for the instance and the ALARM_CPU_HIGH_THRESHOLD AWS Lambda environment variable value is used as the threshold for the alarm. Other alarms are also created based on the alarms defined in the default_alarms dictionary in cw_auto_alarms.py.

In addition to these default alarms, you can create an alarm for any EC2-provided metric by tagging your instance with the following tag key syntax:

AutoAlarm-<Namespace>-<MetricName>-<ComparisonOperator>-<Period>-<Statistic>

The tag value is used to specify the threshold for the alarm. For more information about custom alarm creation and the CloudWatchAutoAlarms Lambda function, check the documentation in the GitHub repository.

Deploy the CloudWatchAutoAlarms Lambda Function

Follow these steps to deploy the CloudWatchAutoAlarms Lambda function into your AWS account. In this walkthrough, I use the us-east-1 Region, but you can change the value to a different Region if you prefer. For deployment into a multi-account, multi-Region AWS environment, check the documentation in the GitHub repository.

Clone the amazon-cloudwatch-auto-alarms GitHub repository to your computer using the following command and change to the directory:
```
git clone https://github.com/aws-samples/amazon-cloudwatch-auto-alarms
cd amazon-cloudwatch-auto-alarms
```
Configure the AWS CLI with credentials for your AWS account. In this walkthrough, I use temporary credentials provided by AWS Single Sign-On using the Command line or programmatic access option. This sets the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN AWS environment variables with the appropriate credentials for use with the AWS CLI.
Create an Amazon SNS topic that CloudWatchAutoAlarms will use for notifications. You can use this sample Amazon SNS CloudFormation template to create an SNS topic. Leave the OrganizationID parameter blank, it is used for multi-account deployments.
```
aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms-sns-topic \
--template-body file://CloudWatchAutoAlarms-SNS.yaml \
--parameters ParameterKey=OrganizationID,ParameterValue="" \
--region <enter your aws region id, e.g. "us-east-1">
```
You can also use the AWS CloudFormation console to deploy the template.

After your Amazon SNS topic is created, subscribe an email address to the topic so it will receive notifications whenever the alarm thresholds are breached.

a) Go to the Amazon SNS console.
b) From the left pane, choose Topics, and then the Amazon SNS topic. If you used the sample Amazon SNS CloudFormation template, the topic name is CloudWatchAutoAlarmsSNSTopic
c) Choose Create subscription and select Email for the Protocol. For Endpoint, enter an email address, and then choose Create subscription.

Figure 1: Add an email address to the Amazon SNS topic used for CloudWatch alarms notification.

d) Confirm the subscription by clicking the link in the confirmation email.

Figure 2: Confirm your email subscription to the CloudWatch alarms Amazon SNS topic.
Create an S3 bucket that will be used to store and access the CloudWatchAutoAlarms lambda function deployment package. You can use this sample CloudFormation template to create an S3 bucket. Leave the OrganizationID parameter blank, it is used for multi-account deployments.
```
aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms-s3-bucket \
--template-body file://CloudWatchAutoAlarms-S3.yaml \
--parameters ParameterKey=OrganizationID,ParameterValue="" \
--region <enter your aws region id, e.g. "us-east-1">
```
Create a zip file containing the CloudWatchAutoAlarms AWS Lambda function code located in the src directory. This is the deployment package that you will use to deploy the AWS Lambda function. On a Mac, you can use the zip command:
```
zip -j amazon-cloudwatch-auto-alarms.zip src/*
```

Copy the amazon-cloudwatch-auto-alarms.zip file to your S3 bucket.

aws s3 cp amazon-cloudwatch-auto-alarms.zip s3://<your S3 bucket name>

You can retrieve the bucket name created in step 5 from the CloudFormation console or run the following AWS CLI command:

aws cloudformation describe-stacks --stack-name amazon-cloudwatch-auto-alarms-s3-bucket \
--query "Stacks[0].Outputs[?ExportName=='amazon-cloudwatch-auto-alarms-bucket-name'].OutputValue" \
--output text \
--region <enter your aws region id, e.g. "us-east-1">

Deploy the AWS lambda function using the CloudWatchAutoAlarms.yaml CloudFormation template and the deployment package you uploaded to your S3 bucket. You will also need to enter the ARN for the SNS topic you created in step 3:

aws cloudformation create-stack --stack-name amazon-cloudwatch-auto-alarms \
--template-body file://CloudWatchAutoAlarms.yaml \
--capabilities CAPABILITY_IAM \
--parameters ParameterKey=S3DeploymentKey,ParameterValue=amazon-cloudwatch-auto-alarms.zip \
ParameterKey=S3DeploymentBucket,ParameterValue=<S3 bucket name with your deployment package> \
ParameterKey=AlarmNotificationARN,ParameterValue=<SNS Topic ARN for Alarm Notifications> \
--region <enter your aws region id, e.g. "us-east-1">

You can retrieve the SNS Topic ARN from step #3 for the AlarmNotificationARN parameter value by running the following command:

aws cloudformation describe-stacks --stack-name amazon-cloudwatch-auto-alarms-sns-topic \
--query "Stacks[0].Outputs[?ExportName=='amazon-cloudwatch-auto-alarms-sns-topic-arn'].OutputValue" \
--output text \
--region <enter your aws region id, e.g. "us-east-1">

You can use the AWS CloudFormation console to confirm that the amazon-cloudwatch-auto-alarms stack has been created. You should find the resources in the following screenshot in the CREATE_COMPLETE state. Your physical IDs for the AWS::IAM::Role and AWS::Lambda::Permission will be different.

The Resources page in the AWS CloudFormation console displays resources for the cloudwatch-auto-alarms stack. The status of each is CREATE_COMPLETE.

Figure 3: Resources created by the cloudwatch-auto-alarms stack

Launch an EC2 instance with automatically created CloudWatch alarms

Now that you have deployed the CloudWatchAutoAlarms Lambda function into your account, you can create a standard alarm set for your EC2 instances by adding the CloudWatchAutoAlarms activation tag. There are many ways you can launch an instance with EC2 instance tags:

For this walkthrough, we will use the Amazon EC2 console.

Create an IAM role with CloudWatch permissions for your EC2 instances

First, create an IAM role so your EC2 instance has permission to send data to CloudWatch in your account. Follow the steps in the Amazon CloudWatch documentation to create the CloudWatchAgentServerRole. Include the AmazonSSMManagedInstanceCore AWS managed IAM policy in the CloudWatchAgentServerRole role. This allows you to connect to the instance with AWS Systems Manager Session Manager.

Launch a new EC2 instance from the Amazon EC2 console

In this walkthrough, I use the console to launch an instance manually with the required Create_Auto_Alarms EC2 instance tag for default alarm set creation.

1. Sign in to your AWS account and open the Amazon EC2 console.
2. Choose Launch Instance. For this walkthrough, I select the Amazon Linux 2 AMI.
  
  Figure 4: Launch an Amazon Linux 2 instance from the console
3. Choose the t2.micro instance type, and then choose Next: Configure Instance Details.
4. Choose a VPC and subnet. Choose a subnet that has internet connectivity so that the CloudWatch agent can connect to the CloudWatch service endpoints. For this walkthrough, you can use a subnet created with this sample Amazon VPC CloudFormation stack.
5. For IAM role, choose the CloudWatchServerRole role you created earlier. This role provides the EC2 instance with permission to send data to CloudWatch.
  
  Figure 5: Configure the EC2 instance details
6. Under Advanced Details, for User data, enter:
```
#!/usr/bin/env bash
yum install amazon-cloudwatch-agent -y
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c default -s
 
```
  Figure 6: Enter a user data script to install the CloudWatch agent
  
  This installs the required Amazon CloudWatch agent for EC2. This approach isn’t scalable or recommended for large scale use. See the Installing the CloudWatch agent using Systems Manager Distributor and State Manager section in Designing and implementing logging and monitoring with Amazon CloudWatch AWS Prescriptive Guidance for an organization wide, scalable solution.
7. Choose Next: Add Storage. Keep the defaults, and then choose Next: Add Tags.
8. Choose Add Tag and enter Create_Auto_Alarms for the tag key. Keep the value empty for the Create_Auto_Alarms tag key. By adding the tag key, the CloudWatchAutoAlarms Lambda function automatically creates a default alarm set for the EC2 instance.
9. Choose Review and Launch, and then choose Launch.
10. On Select an existing key pair or create a new key pair, choose Proceed without a key pair. You can still sign into the EC2 instance using AWS Systems Manager Session Manager because the AWS Systems Manager agent is installed by default on Amazon Linux instances and the AmazonSSMManagedInstanceCore AWS managed IAM policy is attached to the role used by the EC2 instance.
11. Choose Launch Instances.
  
  Figure 7: Proceed without a key pair is selected
  
  As soon as the instance reaches the running state, the Initiate-CloudWatchAutoAlarms Amazon CloudWatch Events rule is triggered and a call to the CloudWatchAutoAlarms Lambda function is initiated. The Lambda function describes the instance that has been launched and checks for an EC2 instance tag key named Create_Auto_Alarms. If the tag key is present, it creates the default alarms using the environment variable thresholds.
12. Open the Amazon CloudWatch console and confirm on the Alarms page that the alarms are created:
  
  Figure 8: View the alarms created by the CloudWatchAutoAlarms Lambda function
  
  The alarms are named using the following format:
  
  AutoAlarm-<InstanceId>-<Namespace>-<MetricName>-<Dimensions…>-<ComparisonOperator>-<Threshold>-<Period>
  
  Initially, the state displayed for the newly created alarms is Insufficient data. Their true state will be displayed after the CloudWatch agent starts sending sufficient metric data to CloudWatch. After a few minutes, you should see the updated state of the alarms. The t2.micro instance used in this walkthrough is allocated 30 launch credits at launch. As a result, the AutoAlarm-<instance id omitted>-AWS/EC2-CPUCreditBalance-LessThanThreshold-100-5m alarm is triggered. This sends a notification to the Amazon SNS topic ARN specified in the DEFAULT_ALARM_SNS_TOPIC_ARN environment variable for the CloudWatchAutoAlarms Lambda function. You subscribed an email address to the Amazon SNS topic and should receive an email message like this one:
  
  Figure 9: Notification email message for breached alarm
  
  You can integrate the Amazon SNS topic used by the CloudWatchAutoAlarms Lambda function with multiple endpoints to perform other actions. For example, you can create multiple subscriptions on the Amazon SNS topic to integrate it with a chatbot, Slack channel, or even another Lambda function that performs heuristics and remediation. CloudWatch alarms also support EC2 actions and other options, such as creating AWS Systems Manager OpsItems. You can configure these types of alternative actions separately on each alarm.

Create an additional alarm for your EC2 instance using tags

In this section, I walk you through the steps to create an alarm for your EC2 instance by adding an additional tag to it. I use the tag key syntax AutoAlarm-<Namespace>-<MetricName>-<ComparisonOperator>-<Period>-<Statistic> to create an alarm for the StatusCheckFailed CloudWatch EC2 metric.

Open the Amazon EC2 console and choose the instance that you launched in the previous section. Choose the Tags tab and view the tags on the EC2 instance that you launched. You will find that the Create_Auto_Alarms tag has been updated with a timestamp value that indicates when the standard alarm set was created for the EC2 instance:

Figure 10: View the Create_Auto_Alarms updated tag
On the Tags tab, choose Manage tags, and then choose Add tag. For Key, enter AutoAlarm-AWS/EC2-StatusCheckFailed-GreaterThanThreshold-5m-Average. For Value, enter 1. Choose Save. The valid values for the StatusCheckFailed metric are either 0 or 1. If the value is 1, it means that either the instance or system status check has failed.

Figure 11: Add a custom alarm tag to the EC2 instance
The CloudWatchAutoAlarms Lambda function checks for new and updated alarm tags when an instance enters the running state. Stop and start the EC2 instance to force an update.
After the instance is stopped and restarted, go to the Alarms page in the CloudWatch console to confirm that the alarm was created:

Figure 12: View the newly created custom alarm in the console

You should find a new alarm named AutoAlarm-<instance id omitted>-StatusCheckFailed-GreaterThanThreshold-1-5m.

You can use maintenance windows as opportunities to create new alarms by adding the appropriate alarm tags for your instances. During the maintenance window, you can initiate a stop and start of the instance to trigger the creation of new alarms. You can also update the thresholds for created alarms by updating the tag values, causing the alarm to be updated when the instance is stopped and started.

When you use tags to create custom alarms, you can selectively customize the alarming for specific instances while deploying a standard set of alarms across all instances. This is especially useful during a large-scale migration where many stakeholders participate in the migration of different types of workloads.

Cleanup

To avoid incurring additional charges in your account, clean up the resources you created.

Terminate the EC2 instance that you provisioned to test the solution. In the Amazon EC2 console, choose Instance State, and then choose Terminate. After you terminate the instance, the alarms that were created for the instance are deleted.
Delete the amazon-cloudwatch-auto-alarms CloudFormation stack used to deploy the CloudWatchAutoAlarms solution.
Delete the Amazon SNS topic created for alarm notifications. If you deployed the sample SNS CloudFormation template, delete the CloudFormation stack.
Delete the Amazon S3 bucket created for the AWS Lambda deployment package. If you deployed the sample S3 CloudFormation template, delete the CloudFormation stack. You will need to empty the S3 bucket before you delete the bucket or CloudFormation stack.

Conclusion

In this blog post, I showed how to accelerate CloudWatch alarm setup for your EC2 instances by using EC2 instance tags. This solution is based on the open source CloudWatchAutoAlarms Lambda function.

You can use this solution to establish a standard set of alarms for all your EC2 instances using standard EC2 instance metrics as well as AWS Lambda functions. You can also use custom CloudWatch metrics captured using the CloudWatch agent for your alarms. In addition to a standard alarm set, you can create alarms for a specific EC2 instance using Amazon EC2 instance tags without affecting other instances. When an EC2 instance that is using automatic alarms is terminated, the solution deletes the associated alarms automatically. This solution helps you standardize alarms across many accounts and Regions and can be used during large-scale migrations to AWS. Your CloudWatch alarms can be integrated with Amazon SNS, making the possibilities for notification and remediation for your EC2 instances nearly limitless.

In part two of this series, I’ll show you how you can use AWS Config rules to enforce standard alarm set creation and remediate instances missing the required activation tag.

AWS Cloud Operations & Migrations Blog