AWS Cloud Operations Blog

Deliver ML-powered operational insights to your on-call teams via PagerDuty with Amazon DevOps Guru

Amazon DevOps Guru, now in preview, is an ML-powered cloud operations service that assists you in improving application availability. It’s easy to set up and use, and leverages machine learning models informed by years of operational expertise in building, scaling, and maintaining highly available applications at Amazon.com. DevOps Guru continuously analyzes streams of disparate data and monitors thousands of metrics to establish normal bounds for application behavior. DevOps Guru identifies deviations from normal conditions in your application. It analyzes metrics, logs, events, and traces within your account and surfaces high severity issues. It correlates relevant data and provides recommendations so that you can react quickly to operational issues.

PagerDuty, an AWS Partner Network (APN) Advanced Technology Partner, is an incident management platform. PagerDuty provides features such as reliable notifications, automatic escalations, and on-call scheduling. Using PagerDuty’s DevOps Guru integration, you will be able to detect and fix infrastructure problems quickly.

In this post, I will walk you through how to enable Amazon DevOps Guru in your AWS account and configure PagerDuty to receive operational insights.

Architecture

Here is the architecture you will be creating.

Amazon DevOps Guru sends insights to Amazon SNS and Amazon SNS forwards the insights to PagerDuty

You will perform the following:

    1. Configure the Amazon DevOps Guru integration in PagerDuty.
    2. Create an AWS SNS topic to forward insights from Amazon DevOps Guru to PagerDuty.
    3. Enable DevOps Guru in your AWS account.

Configure PagerDuty

First, you begin by enabling the Amazon DevOps Guru integration in your PagerDuty account. If you do not have a PagerDuty account, sign up for one here. Next, once logged into PagerDuty, follow these steps:

Step 1. Configure the Amazon DevOps Guru integration

1.     Open your PagerDuty console and select the Services page from the page header
Select the Services link in the PagerDuty console

2.     Next, click the New Service button.

Choose the New Service button

3.     Create a new service with the name Operational Insights and give it a detailed description. In Integration Settings, search for Amazon DevOps Guru under the Integration Type dropdown.

Select Amazon DevOps Guru from integration

4.     Keep the defaults in place and select the Add Service button.

Choose the Add Service button

5.     Now, under the newly created service, click on the Integrations tab, click the name of the Amazon DevOps Guru, and copy the integration URL. You will need this for the next step.

Copy the integration URL

Step 2: Create an Amazon SNS Topic and subscribe PagerDuty integration

1.     Navigate to the Amazon SNS topics console and click Create topic.

Create an Amazon SNS topic

2.     Choose the Standard topic type and give your topic a name like operational-insights. Leave the default settings as they are or configure them to suit your needs, then click Create topic.

Input details of the Amazon SNS topic

3.     After the topic has been created, scroll down to the subscriptions panel and click Create subscription.

Create an Amazon SNS subscription

4.     Select HTTPS as your protocol and paste the integration URL you copied from the previous step. Leave the remaining options as the defaults or configure them to meet your needs. Click Create subscription.

Input details of the subscription

Amazon SNS sends a confirmation message to your PagerDuty integration. PagerDuty automatically approves this subscription.

PagerDuty can now receive notifications from DevOps Guru.

Step 3: Set up Amazon DevOps Guru

1.     Navigate to the Amazon DevOps Guru Console.

Amazon DevOps Guru Console

2.     Click on Settings.

Amazon DevOps Guru Console setting button

3.     Click the Manage button and select the Resources you wish to monitor.

Resource monitor options

4.     Under the SNS notifications section, click Edit. Select the Amazon SNS Topic that you just created from the list of topics, then click on the Enable button

Select Amazon SNS topic

Using the integration

At this point, Amazon DevOps Guru will start monitoring your resources and learning what’s normal behavior for your applications. When an operational issue occurs, it generates insights with a summary of related anomalies, contextual information about the problem, and (when possible) actionable recommendations for remediation. For example, in the image below, DevOps Guru has noticed an anomaly on a DynamoDB stack in my account.

Amazon DevOps Guru insights

When I click on name link, I’m able to see more details about the insight.

Amazon DevOps Guru insight details

Amazon DevOps Guru provides detailed metrics associated with the finding and makes recommendations on ways to remediate the issue.

Amazon DevOps Guru recommendations

The insight is sent to PagerDuty via an Amazon SNS notification. PagerDuty creates an incident with all the critical details of the insight.

PagerDuty incident about Amazon DevOps Guru insight

In PagerDuty, the on-call personnel are able to see information about the insight by clicking on the title. The information includes a URL to the insight allowing them to view metrics and recommendations for insight in the Amazon DevOps Guru console.

Amazon DevOps Guru insight URL

Conclusion

This post has shown you how to us use Amazon DevOps Guru and PagerDuty to set up an architecture that will help you save time and effort otherwise spent on monitoring applications and manually updating static rules and alarms. You can harness the power of ML to find and address operational issues in your AWS account and notify your on-call personnel.

About the Author

Brian Terry is the Senior Partner Solutions Architect and Cloud Management Tools Technical Segment Lead based in Athens, GABrian Terry is the Senior Partner Solutions Architect and Cloud Management Tools Technical Segment Lead based in Athens, GA. Brian’s career spans almost twenty years and enjoys building infrastructure with CloudFormation and creating serverless applications in Go.