How to enable Amazon CloudWatch Alarms to send repeated notifications

Amazon CloudWatch Alarms is natively integrated with Amazon CloudWatch metrics. Many AWS services send metrics to CloudWatch, and AWS also offers many approaches that let you emit your applications’ metrics as custom metrics. CloudWatch Alarms let you monitor the metrics changes when crossing a static threshold or falling out of an anomaly detection band. Furthermore, it lets you monitor the calculated result of multiple alarms. Then, CloudWatch Alarms automatically initiate actions when its state changes between OK, ALARM, and INSUFFICIENT_DATA.

The most commonly used alarm action is to notify a person-of-interest or trigger downstream automation by sending a message to an Amazon Simple Notification Service (SNS) Topic. CloudWatch Alarms are designed to invoke only the alarm actions when a state change happens. The one exception is Autoscaling actions, where the scaling action will keep being invoked periodically when the alarm remains in the state that was configured for the action.

There are scenarios where you may find it useful to have repeated notifications on certain critical alarms so that the corresponding team is alerted to take actions promptly. In this post, I will show you how to use Amazon EventBridge, AWS Step Function, and AWS Lambda to enable repeated alarm notification on selected CloudWatch Alarms. I will also discuss the other customization use cases that can be achieved with alarm state change using the same solution model.

Overview

Since 2019, Amazon EventBridge has integrated with Amazon CloudWatch so that when a CloudWatch alarm’s state changes, a corresponding CloudWatch alarm state change event is sent to the EventBridge service. You can create an EventBridge rule with a customized rule pattern to capture

all of the alarms’ stage change events,
the alarms’ transitions to particular states,
and state change events of the alarms with certain prefixed names.

Matched events mean that the rule invokes downstream automations to process the alarm’s state change event. This solution uses an AWS Step function to orchestrate repeated alarm notification workflow.

In this solution, we will enable repeated alarm notification by applying a specific tag on the CloudWatch alarm resources. Within the Step Function, a Lambda function can query the tags of the triggered alarm and only process further when the specific tag <key:value> is present. Moreover, this approach lets you create a centralized view of all of the alarms with repeated alarm notification enabled by creating a tag-based resource group. The resource group is included as an optional part of this solution.

Solution Architecture

This solution is deployed as an AWS Cloud Development Kit (CDK) application that deploys the resources highlighted within the blue rectangle in the following diagram to your AWS account. These resources are:

An EventBridge rule to capture all of the alarms’ state change events.
A Lambda function to check the alarm’s tag, describe the alarm’s current state, and send notifications to existing SNS alarm actions on the alarm.
A Step Function state machine with a wait task, the previously mentioned Lambda function as a task, and a choice task.
Two AWS Identity and Access Management (IAM) roles used for EventBridge to invoke the step function and for Lambda to perform the required actions respectively.
(Optional) A tag-based resource group including all of the CloudWatch alarms with the feature enablement tag.

Architecture is explained further in the post.

This solution works as follows:

CloudWatch alarm is triggered and goes into the ALARM state.
CloudWatch alarm sends the first alarm notification to the associated SNS alarm actions.
CloudWatch Alarms service sends an alarm state change event which triggers the EventBridge rule. The rule pattern used is shown as follows, which captures all of the alarms’ state changes to the ALARM event.

EventBridge rule pattern used to capture CloudWatch alarms’ transition to ALARM state event

With a match event, the EventBridge rule invokes the Step Function target.
Once the Step Function starts execution, it first enters a Wait state (“Wait X Seconds” as shown in the following figure). The wait period can be configured in the CDK application and passed to the state machine definition.
Then, the Step Function enters the Lambda Invocation task (“Check alarm tag and status” in the figure below). The Lambda invocation task:
1. Checks if the alarm has the specific tag key and value (e.g., RepeatedAlarm:true). If not, the function exits
2. Checks the alarm’s current state by performing a DescribeAlarms API with the alarm name.
3. Publishes the existing alarm’s status returned from the DescribeAlarms API call to all the SNS topics subscribed on the alarm
4. Returns the alarm’s current state together with the original received event back to the Step Function.
5. The Choice state (“Is alarm still in ALARM state?” in the figure below) checks the alarm state returned by the Lambda function and directs the workflow to go back to the Wait state if the alarm state is ‘ALARM’ otherwise it ends the step function’s execution.

Figure explained previously.

The repeated notification for an alarm within the workflow above stops when:

The alarm transitions to a non-ALARM state.
The alarm is deleted.
A specific tag is removed from the alarm.

Procedures

Now, let’s deploy the solution and see how it works.

Prerequisites

AWS Account with AWS Command Line Interface (CLI) access
Node.js 10.13 or later
AWS CDK
Docker service (in running state when performing the following steps)

Step 1: Deploy solution using AWS CDK

Before you can deploy a CDK application, make sure that you have the AWS CDK CLI installed and AWS account bootstrapped, as describe here. Then, run the following command from your terminal to download the solution code and deploy:

git clone https://github.com/aws-samples/amazon-cloudwatch-alarms-repeated-notification-cdk.git
cd amazon-cloudwatch-alarms-repeated-notification-cdk
npm install
npm run build
cdk bootstrap #Required for first time CDK deployment
cdk deploy --parameters RepeatedNotificationPeriod=300 --parameters TagForRepeatedNotification=RepeatedAlarm:true --parameters RequireResourceGroup=false

With the “cdk deploy” command, you can also configure the following parameters:

RepeatedNotificationPeriod: The time in seconds between two consecutive notifications from an alarm. The default is set to 300 in the CDK code.
TagForRepeatedNotification: The tag used to enable repeated notification on an alarm. It must be in a key:value pair. The default for this parameter is RepeatedAlarm:true.
RequireResourceGroup: Whether or not to create a tag-based resource group to monitor all of the CloudWatch Alarms with repeated notification enabled. Allowed values: true/false.

Step 2: Wait for the deployment to finish

Because this is a new deployment, you will see a summary of IAM resources created in the target account. These IAM resources are used by the components in the solution. No change is performed to any existing IAM resources in your account. You can review the change and accept by entering “y” to continue the deployment.

Before solution is actually deployed to the account, CDK CLI tool shows a summary of IAM resources to be created and ask your acceptance.

Then, you will see the progress of the deployment from your terminal. Wait for it to finish. You can also see the progress of the deployment from the CloudFormation.

Step 3: Test the solution

Once the deployment completes, you can test the solution on an alarm by applying the tag that you used.

Find a test alarm with a state that is in the ALARM state and has the SNS alarm actions associated.
Apply the tag on the selected alarm with the following AWS CLI command:

aws cloudwatch tag-resource --resource-arn arn:aws:cloudwatch:<region>:<account_id>:alarm:<alarm_name> --tags Key=RepeatedAlarm,Value=true

Manually set the alarm state to OK by using the set-alarm-state CLI command:

aws cloudwatch set-alarm-state --alarm-name <alarm_name> --state-value OK --state-reason "test"

Wait for the next alarm evaluation. For a standard alarm, it will re-evaluate within one minute and transition to its actual state.
Verify that you received the ALARM notification every five minutes. The repeated notification will have a subject similar to the following:

Repeated alarm notification has a subject of “ALARM: <alarm-name> remains in ALARM state in <region>”

Step 4: View all of the alarms that have repeated notification enabled

AWS Resource Groups lets you search and group AWS resources based on tag. In this post, I will show you how to use this to have a centralized view of all of the alarms with repeated notification enabled.

Go to the Resource Groups & Tag Editor console.

In AWS console, search for service called “Resource Groups & Tag Editor”

If you select “true” for RequireResourceGroup when deploying CDK code, then you will see a tag-based resource named “repeatedAlarmsGroup”.

Under Saved resource groups, you should see a tag-based resource group named “repeatedAlarmsGroup”

You can now view all of the alarms with repeated notification enabled.

You can see the details of the “repeatedAlarmsGroup” and a list of CloudWatch alarms in this region which has the repeated notification tag

Step 5: Disable repeated notification and untag the alarm

Run the following CLI command to untag the CloudWatch alarm. You should see the alarm disappear from the resource group created in the previous step as well:

aws cloudwatch untag-resource --resource-arn arn:aws:cloudwatch:<region>:<account_id>:alarm:<alarm_name> --tag-keys RepeatedAlarm

Cleanup

To avoid additional infrastructure costs from the examples described in this post, ensure to delete all of the resources created. You can simply clean up the resources by running the following command:

cd amazon-cloudwatch-alarms-repeated-notification-cdk
cdk destroy

In addition, the Lambda function created in this solution will log to CloudWatch Log group with the prefix “/aws/lambda/RepeatedCloudWatchAlarm”. Make sure to delete the log group to avoid CloudWatch Log storage charges.

Conclusion

In this post, I’ve provided you with a solution that enables repeated notifications on CloudWatch Alarms utilizing the alarm’s state change event via Amazon EventBridge and AWS Step Function. With this solution, hopefully you won’t miss any mission critical alarms and improve the response time of an incident. The same framework can also be extended to handle more advanced alarm processing tasks.

AWS Cloud Operations Blog

How to enable Amazon CloudWatch Alarms to send repeated notifications

Overview

Solution Architecture

Procedures

Prerequisites

Step 1: Deploy solution using AWS CDK

Step 2: Wait for the deployment to finish

Step 3: Test the solution

Step 4: View all of the alarms that have repeated notification enabled

Step 5: Disable repeated notification and untag the alarm

Further reading

Cleanup

Conclusion

About the authors

Resources

Follow

Learn

Resources

Developers

Help