AWS Cloud Operations Blog
Deliver ML-powered operational insights to your on-call teams via PagerDuty with Amazon DevOps Guru
Amazon DevOps Guru, now in preview, is an ML-powered cloud operations service that assists you in improving application availability. It’s easy to set up and use, and leverages machine learning models informed by years of operational expertise in building, scaling, and maintaining highly available applications at Amazon.com. DevOps Guru continuously analyzes streams of disparate data and monitors thousands of metrics to establish normal bounds for application behavior. DevOps Guru identifies deviations from normal conditions in your application. It analyzes metrics, logs, events, and traces within your account and surfaces high severity issues. It correlates relevant data and provides recommendations so that you can react quickly to operational issues.
PagerDuty, an AWS Partner Network (APN) Advanced Technology Partner, is an incident management platform. PagerDuty provides features such as reliable notifications, automatic escalations, and on-call scheduling. Using PagerDuty’s DevOps Guru integration, you will be able to detect and fix infrastructure problems quickly.
In this post, I will walk you through how to enable Amazon DevOps Guru in your AWS account and configure PagerDuty to receive operational insights.
Architecture
Here is the architecture you will be creating.
You will perform the following:
-
- Configure the Amazon DevOps Guru integration in PagerDuty.
- Create an AWS SNS topic to forward insights from Amazon DevOps Guru to PagerDuty.
- Enable DevOps Guru in your AWS account.
Configure PagerDuty
First, you begin by enabling the Amazon DevOps Guru integration in your PagerDuty account. If you do not have a PagerDuty account, sign up for one here. Next, once logged into PagerDuty, follow these steps:
Step 1. Configure the Amazon DevOps Guru integration
1. Open your PagerDuty console and select the Services page from the page header
2. Next, click the New Service button.
3. Create a new service with the name Operational Insights and give it a detailed description. In Integration Settings, search for Amazon DevOps Guru under the Integration Type dropdown.
4. Keep the defaults in place and select the Add Service button.
5. Now, under the newly created service, click on the Integrations tab, click the name of the Amazon DevOps Guru, and copy the integration URL. You will need this for the next step.
Step 2: Create an Amazon SNS Topic and subscribe PagerDuty integration
1. Navigate to the Amazon SNS topics console and click Create topic.
2. Choose the Standard topic type and give your topic a name like operational-insights
. Leave the default settings as they are or configure them to suit your needs, then click Create topic.
3. After the topic has been created, scroll down to the subscriptions panel and click Create subscription.
4. Select HTTPS as your protocol and paste the integration URL you copied from the previous step. Leave the remaining options as the defaults or configure them to meet your needs. Click Create subscription.
Amazon SNS sends a confirmation message to your PagerDuty integration. PagerDuty automatically approves this subscription.
PagerDuty can now receive notifications from DevOps Guru.
Step 3: Set up Amazon DevOps Guru
1. Navigate to the Amazon DevOps Guru Console.
2. Click on Settings.
3. Click the Manage button and select the Resources you wish to monitor.
4. Under the SNS notifications section, click Edit. Select the Amazon SNS Topic that you just created from the list of topics, then click on the Enable button
Using the integration
At this point, Amazon DevOps Guru will start monitoring your resources and learning what’s normal behavior for your applications. When an operational issue occurs, it generates insights with a summary of related anomalies, contextual information about the problem, and (when possible) actionable recommendations for remediation. For example, in the image below, DevOps Guru has noticed an anomaly on a DynamoDB stack in my account.
When I click on name link, I’m able to see more details about the insight.
Amazon DevOps Guru provides detailed metrics associated with the finding and makes recommendations on ways to remediate the issue.
The insight is sent to PagerDuty via an Amazon SNS notification. PagerDuty creates an incident with all the critical details of the insight.
In PagerDuty, the on-call personnel are able to see information about the insight by clicking on the title. The information includes a URL to the insight allowing them to view metrics and recommendations for insight in the Amazon DevOps Guru console.
Conclusion
This post has shown you how to us use Amazon DevOps Guru and PagerDuty to set up an architecture that will help you save time and effort otherwise spent on monitoring applications and manually updating static rules and alarms. You can harness the power of ML to find and address operational issues in your AWS account and notify your on-call personnel.
About the Author
Brian Terry is the Senior Partner Solutions Architect and Cloud Management Tools Technical Segment Lead based in Athens, GA. Brian’s career spans almost twenty years and enjoys building infrastructure with CloudFormation and creating serverless applications in Go.