How do I troubleshoot my CloudWatch alarm in the INSUFFICIENT_DATA state?

4 minute read
0

I want to troubleshoot my Amazon CloudWatch alarm in the INSUFFICIENT_DATA state.

Short description

When you create a CloudWatch alarm, the first state that it's in is INSUFFICIENT_DATA. It remains in this state until the first evaluation of the metric that's monitored is completed. Typically, an alarm transitions out of the INSUFFICIENT_DATA state within a few minutes of creation. This is normal behavior for a metric.

These are possible causes that keep your CloudWatch alarm in the INSUFFICIENT_DATA state:

  • The metric has missing data points.
  • The metric parameters are misconfigured.
  • The alarm periods are misconfigured.
  • Delayed delivery of data points due to lack of connectivity

Resolution

To troubleshoot your CloudWatch alarm in the INSUFFICIENT_DATA state, check the following possible causes:

The metric has missing data points

When you have a metric that monitors events such as infrastructure changes, network failures, and service disruptions, the metric doesn't report data points regularly. If an alarm has no metric data points in a specified time period, then the data points are missing and the alarm state is INSUFFICIENT_DATA.

To resolve an INSUFFICIENT_DATA state that's caused by missing data points, make sure that you configure how the alarm handles missing data points. Use the notBreaching parameter to treat missing data points as good and within the threshold or the ignore parameter to maintain the current alarm state. For more information, see Configuring how CloudWatch alarms treat missing data.

The metric parameters are misconfigured

Each metric is defined by a namespace, a metric name, and up to 30 dimensions. When a data point is retrieved, a timestamp must be specified and, optionally, a unit. If you provide an incorrect value for one of these parameters, then CloudWatch tries to retrieve a metric that doesn't exist. This results in an empty dataset.

Note: Data points are usually pushed to a metric with a single unit. You aren't required to specify the unit when you create an alarm. Also, you don't encounter configuration issues if you don't specify a unit. It's a best practice to use the correct unit when you have multiple units for your metric data points.

To resolve an INSUFFICIENT_DATA state that's caused by misconfigured parameters, complete the following steps:

  1. Run the DescribeAlarms API command to get a complete list of parameters for your monitored metrics.
  2. Run the ListMetrics API command. Compare the ListMetrics output to the list of parameters for your monitored metrics.
  3. Check the metric parameters for misspellings, improper use of lowercase and uppercase letters, and incorrect or missing dimensions.

The alarm periods are misconfigured

An alarm can be configured to retrieve data points at your required frequency. However, if you use a shorter period than the period used by the service or source, then you might get unwanted alarm states. To resolve this, configure your alarm's period to be greater than or equal to the period that the metric data points are pushed. Also, you can set your alarm to use M out of N settings. For more information, see Evaluating an alarm.

Delayed delivery of data points due to lack of connectivity

If you have a standard alarm setup, then the alarm evaluates your metric every minute. When the metric is evaluated, the alarm retrieves the available data points from the configured metric. If you don't have connectivity when your alarm evaluates the metric, then the alarm can't retrieve data points or deliver them to CloudWatch. This invokes the INSUFFICIENT_DATA state.

After connectivity is restored, the backlog of data points with timestamps included are sent to CloudWatch. When the data points are sent after the delay, the alarm retrieves recent data points based on the period and evaluation period specified. Also, blank spaces in the metric are no longer blank and the data points are stored in CloudWatch. However, the alarm has already evaluated that timeframe and the alarm history still indicates an INSUFFICIENT_DATA state and an error message similar to the following:

"stateValue": "INSUFFICIENT_DATA",
   "stateReason": "Insufficient Data: 2 datapoints were unknown."

To resolve an INSUFFICIENT_DATA state that's caused by a delayed delivery of data points, configure how your CloudWatch alarm treats missing data.

AWS OFFICIAL
AWS OFFICIALUpdated a month ago