Amazon DevOps Guru

ML-powered cloud operations service to improve application availability

Amazon DevOps Guru is a service powered by machine learning (ML) that is designed to make it easy to improve an application’s operational performance and availability. DevOps Guru helps detect behaviors that deviate from normal operating patterns so you can identify operational issues long before they impact your customers.

DevOps Guru uses ML models informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (for example, increased latency, error rates, resource constraints, and others) and helps surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, the likely root cause, and context for when and where the issue occurred. When possible, DevOps Guru also helps provide recommendations on how to remediate the issue.

With one-click deployment, DevOps Guru automatically ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data. You can get started by enabling DevOps Guru for all resources in your AWS account, resources in your AWS CloudFormation Stacks, or resources grouped together by AWS Tags, with no manual setup or ML expertise required.

7,200 AWS resource hours free

each for resource group A and B

per month for 3 months with the AWS Free Tier

Benefits

Detect Issues

Automatically detect operational issues

Using ML, Amazon DevOps Guru automatically collects and analyzes data such as application metrics, logs, events, and behaviors that deviate from normal operating patterns. The service is designed to automatically detect and alert on operational issues and risks, such as impending resource exhaustion, code and configuration changes that may cause outages, memory leaks, under-provisioned compute capacity, and database input/output (I/O) overutilization.

Resolve Issues

Resolve issues quickly with ML-powered insights

Amazon DevOps Guru helps reduce time to identify and resolve the root cause of issues by correlating anomalous behavior and operational events. When an issue occurs, DevOps Guru is designed to generate insights with a summary of related anomalies and contextual information about the issue. When possible, it helps provide actionable recommendations for remediation.

Scale

Easily scale and maintain availability

Amazon DevOps Guru saves you the time and effort involved in manually updating static rules and alarms so you can effectively monitor complex and evolving applications. When you migrate or adopt new AWS services, DevOps Guru automatically analyzes their metrics, logs, and events. Then it produces insights, helping you easily adapt to changing behavior and evolving system architecture.

Reduce noise

Reduce noise and alarm fatigue


Amazon DevOps Guru helps developers and IT operators reduce alarm noise and overcome alarm fatigue by using pre-trained ML models to correlate and group related anomalies and surface the most critical alerts. With DevOps Guru, you can reduce the need to manage multiple monitoring tools and alarms, which means you can focus on the root cause of the issue and remediation.

How it works

DevOps Guru How it Works
Get operational insights


Gain Operational Insights with Amazon DevOps Guru

Use cases

Improve operational performance and availability

Prevent operational incidents before they occur. Amazon DevOps Guru is designed to surface medium- and low-severity findings that affect the reliability of your application over time, such as hitting the limits of auto-scaling groups, changes in latency patterns, or increased API call volume.

Dynamically discover new resources and metrics

As your application evolves and new supported resources are added, Amazon DevOps Guru is designed to learn patterns for each new metric and alerts you with early warnings of operational issues. No more updating or fixing misconfigured alarms—DevOps Guru ingests metrics from these resources and classifies them automatically.

Reduce mean-time-to-recovery

Quickly diagnose and remediate issues for AWS resources including relational databases such as overutilization of resources or misbehavior of certain SQL queries with DevOps Guru’s operational insights. These insights reduce mean-time-to-recovery (MTTR) using relevant information on impacted resources and related anomalies, and provides recommendations using contextual data such as logs and relevant events.

Proactive resource management

With DevOps Guru you can identify when your exhaustible resources such as memory, CPU, and disk space will exceed the provisioned capacity. DevOps Guru continuously ingests and analyzes your resources and applications that run on AWS, and helps you avoid an impending outage by creating a low noise notification in the dashboard.

Customers

HCL Technologies
“We are always looking for ways to reduce the amount of time our teams spend on resolving operational issues, and we are now using Amazon DevOps Guru and leveraging its ML-powered insights to help us identify, correlate, and remediate operational issues quickly. With the insights Amazon DevOps Guru provides, our teams can now quickly find issues without having to start from scratch trying to root cause problems. Our IT team has significantly reduced our MTTR, and they are saving hours upon hours of time resolving issues—all the while ensuring our customers have the best end-user experience possible.”

Anchal Gupta, Senior Technical Lead, DevOps - HCL

Thomson Reuters
“Customer experience and satisfaction are our top priorities. When multiple sources of alerts and monitoring events are received, it can be challenging and time-consuming to filter through the noise to identify customer-impacting incidents. With Amazon DevOps Guru, we are able to leverage its ML-powered insights to provide clear paths for action to reduce—and in many cases eliminate—the impact issues have on our customers. The Amazon DevOps Guru integration with PagerDuty also provides a direct path to quickly and efficiently deliver recommendations to the right people at the right time, and we anticipate significantly reduced operational downtime as a result.”

Steve Thoennes, Director Infrastructure Hosting Portfolio - Thomson Reuters

605.tv
“We have more than a dozen AWS accounts and tens of thousands of resources to monitor. Even with Infrastructure as Code and creating dynamic alerts for these services, it is difficult to manage and correlate metrics to quickly resolve issues. With Amazon DevOps Guru, we are confident that the alerts and notifications we receive are accurate from the ML-powered metrics correlated across multiple services. Integrating Amazon DevOps Guru only took minutes to implement, and it was a breeze to integrate with our thousands of AWS CloudFormation stacks. Amazon DevOps Guru has provided insights that help us focus our infrastructure roadmap.”

Jared Williams, Director of DevOps - 605.tv

Partners

Atlassian
"Atlassian is excited that our customers are implementing an AIOps strategy using Amazon DevOps Guru to manage the operational performance of their cloud applications. With our new Opsgenie and Jira Service Management integration, the right teams are notified the instant Amazon DevOps Guru discovers a potential issue and prioritizes it by the severity of the incident using ML. This integration ensures that every team can quickly respond to, resolve using ML-powered recommendations, and learn from every incident.”

Emel Dogrusoz, Head of Product, Opsgenie - Atlassian

Learn how you can deliver operational insights directly to your on-call team by integrating Amazon DevOps Guru with Atlassian Opsgenie
PagerDuty
"PagerDuty is further deepening our partnership with AWS with a new integration with Amazon DevOps Guru. PagerDuty's digital operations management platform was built to drive a shift to DevOps culture, and we are delighted to continue this commitment with this integration. Harnessing DevOps Guru's ML capabilities, PagerDuty provides even more real-time signal-to-action capabilities to our joint customers. Through PagerDuty’s ingestion of the Amazon Simple Notification Service (SNS), AWS customers can take real-time action on operational issues before they become customer-impacting outages.” 

Jonathan Rende, SVP of Product - PagerDuty

Learn more about delivering ML-powered operational insights to your on-call teams via PagerDuty and Amazon DevOps Guru

Blog posts & articles

devops guru 1a

New- Amazon DevOps Guru Helps Identify Application Errors and Fixes

December 2020

Harunobu Kameda

Read blog »

devops guru 2

Easily configure Amazon DevOps Guru across multiple accounts and Regions using AWS CloudFormation StackSets

December 2020

Nikunj Vaidya & Nuatu Tseggai

Read blog »

devops guru reinvent thumbnail

AWS re:Invent 2020: Improve application availability w ML-powered insights using Amazon DevOps Guru

December 2020

Jacob Sullivan

Watch the webinar »

devops guru 4

Amazon DevOps Guru is powered by pre-trained ML models that encode operational excellence

February 2020

Caner Turkmen, Ravi Turlapati & Tim Januschowski

Read blog »

Automate code reviews

Automate code reviews
Catch code problems faster and earlier with Amazon CodeGuru

Amazon DevOps Guru features
Check out the product features

Easily improve your application’s operational performance and availability

Learn more 
Sign up for a free account
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console
Start building in the console

Get started building with Amazon DevOps Guru in the AWS Management Console.

Sign in