Amazon DevOps Guru is a service powered by machine learning (ML) that is designed to make it easy to improve an application’s operational performance and availability. DevOps Guru helps detect behaviors that deviate from normal operating patterns so you can identify operational issues long before they impact your customers.
DevOps Guru uses ML models informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (for example, increased latency, error rates, resource constraints, and others) and helps surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, the likely root cause, and context for when and where the issue occurred. When possible, DevOps Guru also helps provide recommendations on how to remediate the issue.
With one-click deployment, DevOps Guru automatically ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data. You can get started by enabling DevOps Guru for all resources in your AWS account, resources in your AWS CloudFormation Stacks, or resources grouped together by AWS Tags, with no manual setup or ML expertise required.
Automatically detect operational issues
Using ML, Amazon DevOps Guru automatically collects and analyzes data such as application metrics, logs, events, and behaviors that deviate from normal operating patterns. The service is designed to automatically detect and alert on operational issues and risks, such as impending resource exhaustion, code and configuration changes that may cause outages, memory leaks, under-provisioned compute capacity, and database input/output (I/O) overutilization.
Resolve issues quickly with ML-powered insights
Amazon DevOps Guru helps reduce time to identify and resolve the root cause of issues by correlating anomalous behavior and operational events. When an issue occurs, DevOps Guru is designed to generate insights with a summary of related anomalies and contextual information about the issue. When possible, it helps provide actionable recommendations for remediation.
Easily scale and maintain availability
Amazon DevOps Guru saves you the time and effort involved in manually updating static rules and alarms so you can effectively monitor complex and evolving applications. When you migrate or adopt new AWS services, DevOps Guru automatically analyzes their metrics, logs, and events. Then it produces insights, helping you easily adapt to changing behavior and evolving system architecture.
Reduce noise and alarm fatigue
Amazon DevOps Guru helps developers and IT operators reduce alarm noise and overcome alarm fatigue by using pre-trained ML models to correlate and group related anomalies and surface the most critical alerts. With DevOps Guru, you can reduce the need to manage multiple monitoring tools and alarms, which means you can focus on the root cause of the issue and remediation.
How it works
Improve operational performance and availability
Prevent operational incidents before they occur. Amazon DevOps Guru is designed to surface medium- and low-severity findings that affect the reliability of your application over time, such as hitting the limits of auto-scaling groups, changes in latency patterns, or increased API call volume.
Dynamically discover new resources and metrics
As your application evolves and new supported resources are added, Amazon DevOps Guru is designed to learn patterns for each new metric and alerts you with early warnings of operational issues. No more updating or fixing misconfigured alarms—DevOps Guru ingests metrics from these resources and classifies them automatically.
Quickly diagnose and remediate issues for AWS resources including relational databases such as overutilization of resources or misbehavior of certain SQL queries with DevOps Guru’s operational insights. These insights reduce mean-time-to-recovery (MTTR) using relevant information on impacted resources and related anomalies, and provides recommendations using contextual data such as logs and relevant events.
Proactive resource management
With DevOps Guru you can identify when your exhaustible resources such as memory, CPU, and disk space will exceed the provisioned capacity. DevOps Guru continuously ingests and analyzes your resources and applications that run on AWS, and helps you avoid an impending outage by creating a low noise notification in the dashboard.
“We are always looking for ways to reduce the amount of time our teams spend on resolving operational issues, and we are now using Amazon DevOps Guru and leveraging its ML-powered insights to help us identify, correlate, and remediate operational issues quickly. With the insights Amazon DevOps Guru provides, our teams can now quickly find issues without having to start from scratch trying to root cause problems. Our IT team has significantly reduced our MTTR, and they are saving hours upon hours of time resolving issues—all the while ensuring our customers have the best end-user experience possible.”
Anchal Gupta, Senior Technical Lead, DevOps - HCL
“Customer experience and satisfaction are our top priorities. When multiple sources of alerts and monitoring events are received, it can be challenging and time-consuming to filter through the noise to identify customer-impacting incidents. With Amazon DevOps Guru, we are able to leverage its ML-powered insights to provide clear paths for action to reduce—and in many cases eliminate—the impact issues have on our customers. The Amazon DevOps Guru integration with PagerDuty also provides a direct path to quickly and efficiently deliver recommendations to the right people at the right time, and we anticipate significantly reduced operational downtime as a result.”
Steve Thoennes, Director Infrastructure Hosting Portfolio - Thomson Reuters
“We have more than a dozen AWS accounts and tens of thousands of resources to monitor. Even with Infrastructure as Code and creating dynamic alerts for these services, it is difficult to manage and correlate metrics to quickly resolve issues. With Amazon DevOps Guru, we are confident that the alerts and notifications we receive are accurate from the ML-powered metrics correlated across multiple services. Integrating Amazon DevOps Guru only took minutes to implement, and it was a breeze to integrate with our thousands of AWS CloudFormation stacks. Amazon DevOps Guru has provided insights that help us focus our infrastructure roadmap.”
Jared Williams, Director of DevOps - 605.tv
"Atlassian is excited that our customers are implementing an AIOps strategy using Amazon DevOps Guru to manage the operational performance of their cloud applications. With our new Opsgenie and Jira Service Management integration, the right teams are notified the instant Amazon DevOps Guru discovers a potential issue and prioritizes it by the severity of the incident using ML. This integration ensures that every team can quickly respond to, resolve using ML-powered recommendations, and learn from every incident.”
Emel Dogrusoz, Head of Product, Opsgenie - Atlassian
"PagerDuty is further deepening our partnership with AWS with a new integration with Amazon DevOps Guru. PagerDuty's digital operations management platform was built to drive a shift to DevOps culture, and we are delighted to continue this commitment with this integration. Harnessing DevOps Guru's ML capabilities, PagerDuty provides even more real-time signal-to-action capabilities to our joint customers. Through PagerDuty’s ingestion of the Amazon Simple Notification Service (SNS), AWS customers can take real-time action on operational issues before they become customer-impacting outages.”
Jonathan Rende, SVP of Product - PagerDuty
Blog posts & articles
New- Amazon DevOps Guru Helps Identify Application Errors and Fixes
Easily configure Amazon DevOps Guru across multiple accounts and Regions using AWS CloudFormation StackSets
Nikunj Vaidya & Nuatu Tseggai
AWS re:Invent 2020: Improve application availability w ML-powered insights using Amazon DevOps Guru
Easily improve your application’s operational performance and availability
Instantly get access to the AWS Free Tier.
Get started building with Amazon DevOps Guru in the AWS Management Console.