How can I be sure that CloudWatch alarms activate actions?

3 minute read

My Amazon CloudWatch alarm isn't activated even though I can see from my CloudWatch graphs that the alarm metric exceeds the configured threshold. How can I be sure that my CloudWatch alarms are activated and the alarm actions are performed?

Short description

CloudWatch alarms that measure time-aggregated metrics (such as five-minute averages) perform this measurement continuously in a rolling window. If all the data points collected during the evaluation period don't exceed the configured threshold, then the CloudWatch alarm isn't activated.

CloudWatch alarms start actions when the alarm state changes and is maintained for a specified number of periods. For more information, see Creating CloudWatch alarms.

Important: There is an exception to this behavior for CloudWatch alarms that are associated with Amazon EC2 Auto Scaling actions. A CloudWatch alarm keeps activating Auto Scaling actions when that alarm is in a specified state. This happens even if there are no state changes and the alarm remains in that state.

Resolution

Be sure to consider the mechanism used by CloudWatch to measure time-aggregated metrics when you create alarms.

Consider lowering the metric data thresholds to be sure the alarm works as you expect.

Troubleshooting example

In this example, there is an alarm based on average CPU utilization. The alarm is configured with a threshold of > 45. It runs for at least three consecutive periods of five minutes. The evaluation period is of three and a period of 300 seconds for the following time-aggregated metrics:

05:25:00: data: {Avg=61.123}
05:30:00: data: {Avg=57.847}
05:35:00: data: {Avg=60.503}
05:40:00: data: {Avg=55.473}
05:45:00: data: {Avg=41.685}
05:50:00: data: {Avg=58.390}
05:55:00: data: {Avg=57.846}
06:00:00: data: {Avg=61.123}

These data points result in the following alarm states:

05:35 ALARM
05:40 ALARM
05:45 ALARM to OK
05:50 OK
05:55 OK
06:00 OK to ALARM

The data point collected at 05:55 exceeds the Average CPU Utilization threshold of 45%. However, the alarm remains in the OK state and doesn't activate the action at 05:55. This happens because the data point collected at 05:45:00, which doesn't exceed the threshold, is included in evaluation at 05:55. However, five minutes later, the alarm starts the action because the alarm state changes from OK to ALARM at 06:00.

For the following time-aggregated metrics, the alarm state is ALARM after 05:35 because all the data points exceed the Average CPU Utilization threshold of 45%. Because there are no state changes, the alarm action isn't activated.

05:25:00: data: {Avg=61.123}
05:30:00: data: {Avg=57.847}
05:35:00: data: {Avg=60.503}
05:40:00: data: {Avg=55.473}
05:45:00: data: {Avg=45.075}
05:50:00: data: {Avg=58.390}
05:55:00: data: {Avg=57.847}
06:00:00: data: {Avg=61.123}

Related information

Dynamic scaling for Amazon EC2 Auto Scaling

Viewing available metrics

Topics

Management & Governance

Relevant content

How to remove cloudwatch alarm metric
Accepted Answer
Shantanu Oak
asked a year ago
Cloudwatch 'number' widget that displays number of active alarms
rePost-User-6610418
asked 2 years ago
AppRunner CloudWatch Alarm not showing metrics
Accepted Answer
Evandro Pomatti
asked a year ago
Billing Alarm that Measures Credits Spent / Resource Usage
Accepted Answer
Hennadii
asked 17 days ago
CloudWatch Alarm not Invoking Action
grahem
asked 3 months ago
How do I troubleshoot a CloudWatch alarm that monitors CloudWatch agent metrics and is in the INSUFFICIENT_DATA state?
AWS OFFICIALUpdated 14 days ago
How do I combine multiple CloudWatch alarms into one alarm?
AWS OFFICIALUpdated 15 days ago
How do I create an Amazon CloudWatch alarm that monitors cross account metrics?
AWS OFFICIALUpdated 3 months ago
How do I troubleshoot a CloudWatch alarm that doesn’t invoke?
AWS OFFICIALUpdated 21 days ago
Monitoring SageMaker Notebook Instance with CloudWatch Custom Metrics
EXPERT
Ben Lee
published 4 months ago