AWS DevOps Blog

Reduce Time to Resolution with Amazon CloudWatch Snapshot Graphs and Alerts

Steve McCurry is a Senior Product Manager for Amazon CloudWatch.

This is the first in a series of two blog posts that show how to use the new Amazon CloudWatch
snapshot graphs feature.

Although automated alerts are an important feature of any monitoring solution, including
Amazon CloudWatch, it can be challenging to identify the issues that matter in the noise of
monitoring alerts. Reducing the time to resolution depends on being able to make quick
decisions around the importance of alerts.

When you receive an alert, you ask, “Is this issue urgent?” Unfortunately, a page or email alert
that contains a text description about a symptom doesn’t provide enough context to answer the
question. You need to correlate the alert with metrics to understand what was happening around
the time of the alert. This digging for more information slows down the process of resolving the
issue.

This blog post shows you how to add richer context to an email alert by attaching a CloudWatch
snapshot graph. The snapshot graph shows the behavior of the underlying metric for the three
hours leading up to the alert.

Snapshot graphs overview

You can use snapshot graphs to integrate and display CloudWatch charts outside of the AWS
Management Console to improve monitoring visibility or reduce time to resolution. This feature
makes it possible for you to display CloudWatch charts on your webpage or integrate charts with
third-party tools, such as ticketing, chat applications, and bug tracking.

CloudWatch snapshot graphs are images of CloudWatch charts that are useful for building
custom dashboards. Although the images are static, they can be refreshed frequently to create a
live dashboard experience.

Snapshot graphs are available through the CloudWatch API, which you can use through the
AWS SDKs or AWS CLI. The charts you request through the API are represented as JSON. To
copy the JSON definition of the chart and use it in the API request, open the Amazon
CloudWatch console. You’ll find the JSON on the Source tab of the Metrics page, as shown
here.

All of the features of the CloudWatch line and stacked charts are available in snapshot graphs,
including vertical and horizontal annotations. The example in this post uses horizontal annotations.

Adding context to an EC2 monitoring alert

In this post, we will set up monitoring for an EC2 instance and generate a monitoring alert. The
alert contains details and a chart that displays the trend of the underlying metric (CPUUtilization)
for three hours leading up to the alert. To follow along, use the code in the SnapshotAlarmDemo
GitHub repo.

These are the steps required for the monitoring workflow:

  1. Create an EC2 instance to monitor.
  2. Create a Lambda function that creates an email alert with a snapshot graph attachment.
  3. Create an Amazon Simple Notification Service (Amazon SNS) topic with a Lambda
    subscription.
  4. Configure Amazon Simple Email Service (Amazon SES) to send email to your
    account(s).
  5. Create an alarm on a metric in CloudWatch whose target is the SNS topic.

After these components are set up and configured, the alarm will be triggered when the metric
breaches the alarm threshold value. The alarm will trigger the SNS topic and the Lambda
function will be executed. The Lambda function will interrogate the SNS message to extract the
details of the underlying metric and will call the Snapshot Graphs API to create the
corresponding chart. The Lambda function will also create an email and add the chart as an
attachment before using SES to send it. The operator will receive the email alert and can view
the chart immediately, without signing in to AWS.

Here is what the end-to-end solution looks like:

Create the EC2 instance to monitor

  1. Open the Amazon EC2 console.
  2. From the console dashboard, choose Launch Instance.
  3. On the Choose an Instance Type page, choose any instance type. For size, choose nano.
  4. Choose Review and Launch to let the wizard complete the other configuration settings
    for you.
  5. On the Review Instance Launch page, choose Launch.
  6. When prompted for a key pair, select Choose an existing key pair, or create one.
  7. Make a note of the instance ID that is displayed on the Launch Status page. You need
    this later, when you create your dashboard.

Create the Lambda function

First, create an IAM role for your Lambda function. Your Lambda function needs a role with the
permissions required to execute Lambda and call the CloudWatch API.

Step 1: Create the IAM role

Navigate to the IAM console.

In the navigation pane, choose Roles, and then choose Create role.

Add the following policies to the new role. These policies are more permissive than the
minimum permissions required for this example, so review and adjust according to your
requirements:

  • CloudWatchReadOnlyAccess
  • AmazonSESFullAccess
  • AmazonSNSReadOnlyAccess
  • AWSLambdaBasicExecutionRole

Step 2: Create the Lambda function

Next, navigate to the AWS Lambda console and choose Create a function. If you are using the
demo code provided with this post, choose Node.js 6.10 (or later) for the runtime. Attach the
IAM role you created in step 1.

Step 3: Upload the code

Download the code from the SnapshotAlarmDemo GitHub repo. You will need to run npm install locally and then ZIP the project to upload to the Lambda function.

For Handler, enter emailer.myHandler.

Set the function timeout to 30 seconds, and then choose Save. Requests to the Snapshot Graphs
service will take longer based on the number of metrics and time interval requested. To optimize
request time, keep requests to under three hours of the most recent data.

Step 4: Set the environment variables

The email address and region used by the Lambda function are configured in the environment
variables.

EMAIL_TO_ADDRESS is the email address where the alert email will be sent.
EMAIL_FROM_ADDRESS is the sender email address.

Create the SNS topic

Navigate to the Amazon SNS console and choose Create Topic.

Create a subscription on the topic. For Endpoint, enter your Lambda function.

Configure SES

Navigate to the Amazon SES console. In the left navigation pane, choose Email Addresses.
Choose Verify a New Email Address, and then enter the email address.

SES will send a verification email to the selected address. To verify the account, you must click
the link in the verification email.

Create an alarm in CloudWatch

Navigate to the Amazon CloudWatch console. In the left navigation pane, choose Alarms, and
then choose Create Alarm.

For the Select Metric step, select an instance and metric (CPUUtilization, as shown here). For
the Define Alarm step, enter a unique name and threshold value. Use your SNS topic as the
target action. Change the period to 1 minute. Create the alarm.

Testing the workflow

To simulate a problem, let’s manually change the threshold of the metric to a very low value.

Save the alarm and then wait for the alarm to go into an error state. (This can take a couple of
minutes.) As soon as your alarm is in the error state, you should receive the alert email almost
immediately.

The email contains information about the alarm. The chart of the last three hours of the
associated metric is embedded in the email:

Troubleshooting

If you didn’t receive an email, make sure that the alarm is in the alarm state. If it is, check the
AWS Lambda logs, which you’ll find on the Logs tab of the CloudWatch console.

Conclusion

As you have seen in this demo, you can create CloudWatch alarm workflows that provide more
context than a basic text alert.

In my next post in this series, I will show you other ways to use CloudWatch snapshot graphs to
improve monitoring visibility.

For more information, see the snapshot graphs API documentation.

Taking it further

Most popular ticketing systems allow you to send emails that autogenerate tickets. You can use
the code provided in this post to email your ticketing system to create a ticket for the alarm and
automatically attach the snapshot graph in the body of the ticket. Try it!

You can also use the code provided in this post as a starting point to programmatically email an
entire CloudWatch dashboard rather than an individual chart. Simply retrieve the dashboard
definition using the GetDashboard API and make multiple calls to GetMetricWidgetImage.

I look forward to seeing what you build!