Amazon DevOps Guru

ML-powered cloud operations service to improve application availability

Amazon DevOps Guru is a Machine Learning (ML) powered service that makes it easy to improve an application’s operational performance and availability. DevOps Guru detects behaviors that deviate from normal operating patterns so you can identify operational issues long before they impact your customers.

DevOps Guru uses machine learning models informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (e.g. increased latency, error rates, resource constraints, etc.) and surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, the likely root cause, and context about when and where the issue occurred. When possible DevOps Guru, also provides recommendations on how to remediate the issue.

DevOps Guru automatically ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data. You can get started with DevOps Guru to improve application availability and reliability with no manual setup or machine learning expertise.

DevOps_Guru_Dashboard
Amazon DevOps Guru Dashboard
 Click to enlarge
DevOps_Guru_Dashboard

Benefits

2_icon_2_resolve_issues

Automatically detect operational issues

Using machine learning, Amazon DevOps Guru automatically collects and analyzes data such as application metrics, logs, and events and identifying behaviors that deviate from normal operating patterns. It automatically detects and alerts on operational issues and risks, such as impending resource exhaustion, code and configuration changes that may cause outages, memory leaks, under-provisioned compute capacity, and database I/O overutilization.

2_icon_1_auto_detect

Resolve issues quickly with ML-powered insights

Amazon DevOps Guru helps reduce the time to identify and resolve the root cause of issues by by correlating anomalous behavior and operational events. When an issue occurs, DevOps Guru generates insights with a summary of related anomalies, contextual information about the issue and, when possible, it provides actionable recommendations for remediation.

2_icon_3_easily_scale

Easily scale and maintain availability

Amazon DevOps Guru saves you the time and effort involved in manually updating static rules and alarms so you can effectively monitor complex and evolving applications. When you migrate or adopt new AWS services, DevOps Guru automatically analyzes their metrics, logs, and events. Then it produces insights, helping you easily adapt to changing behavior and evolving system architecture.

2_icon_4_reduce_noise

Reduce noise and alarm fatigue


AmazonDevOps Guru helps Developers and IT operators reduce alarm noise and overcome alarm fatigue by using pre-trained machine learning models to correlate and group related anomalies and surface the most critical alerts. With DevOps Guru, you can reduce the need to manage multiple monitoring tools and alarms, which means you can focus on the root cause of the issue and remediation.

How it works

Amazon-DevOps-Guru_Diagram-V1_news
4_promo_icon


Amazon DevOps Guru Preview

Use cases

Operational audits

You can use Amazon DevOps Guru to get a quick summary of all the operationally significant events that have been, identified, sorted by their severity. Using the System Health Dashboard you can search for issues in specific applications, identify trends, and decide where developers should spend their time and resources.

Proactive resource exhaustion planning

Build predictive alarming for exhaustible resources such as memory, CPU, and disk space. Amazon DevOps Guru forecasts when resource utilization will exceed the provisioned capacity, and informs you by creating a notification in the dashboard, helping you avoid an impending outage.

Preventative maintenance

With Amazon DevOps Guru you can prevent incidents before they occur. DevOps Guru flags medium and low-severity findings that might not be critical, but if left alone worsen over time and affect the availability of your application. This helps you prioritize, and avoid unforeseen downtime. For example, DevOps Guru notifies you about hitting the limits of your auto scaling groups, changes in latency patterns, or increased API call volume. DevOps Guru also identifies AWS best practices to help you increase the overall availability of your application. 

Customers

NextRoll
“We run thousands of EC2 instances and I am always looking for ways to reduce the time my team spends on resolving operational issues. We are excited to use Amazon DevOps Guru and leverage its ML-powered insights to help us identify, correlate and remediate operational issues. This will help my team save hours and reduce our mean time to recovery (MTTR).”

- Valentino Volonghi
CTO, NextRoll

SmugMug
"My team follows an ops-for-life motto, and we are always on the lookout for ways to automate our manual activities. With Amazon DevOps Guru, we hope to realize that goal and let AIOps take over many of our day-to-day tasks, so my team can focus on IT innovation. We are now not only meeting the needs of the business but able to exceed them since we have more time to focus on what matters most – delivering value for our organization and our customers."

- Andrew Shieh
SmugMug’s Operations Director

Thomson Reuters
“Customer experience is vital to us. Dealing with multiple sources of alerts for availability, performance, and change requests can be a challenge when trying to prevent and mitigate incidents impacting our customers. We are excited to use Amazon DevOps Guru and leverage its ML-powered insights to provide clear paths for action. This allows us to mitigate issues quickly and avoid events that impact customers. The integration with PagerDuty is a bonus, as we can have recommendations delivered to the right people timely and efficiently.”

- Steve Thoennes
Director Infrastructure Hosting Portfolio

Partners

Atlassian
"Atlassian is proud to support Amazon on the launch of DevOps Guru and help empower teams to deploy code and operate services with confidence. With our new Opsgenie and Jira Service Management integration, the right teams can be immediately notified the instant DevOps Guru predicts a potential issue, or determines an incident has occurred. DevOps Guru provides a new dimension of insight, and Atlassian ensures the fastest response."

- Emel Dogrusoz
Head of Product, Opsgenie

PagerDuty
"PagerDuty was built to drive the move to a DevOps culture by automating the entire incident response lifecycle with resolution. We’re excited to continue this commitment to DevOps with our latest integration with Amazon DevOps Guru. Leveraging Amazon’s decades of operational excellence and DevOps Guru’s machine learning capabilities, PagerDuty provides even more real-time signal-to-action capabilities to our joint customers. Through PagerDuty’s ingestion of DevOps Guru’s Amazon Simple Notification Service (SNS) notifications, AWS customers can take real-time action on operational issues before they become customer-impacting outages.”

- Jonathan Rende
SVP of Product

7_bottom_promo_icon

Automate code reviews
Catch code problems faster and earlier with Amazon CodeGuru

Product-Page_Standard-Icons_01_Product-Features_SqInk
Check out the product features

Easily build sophisticated personalization capabilities
into your applications

Learn more 
Product-Page_Standard-Icons_02_Sign-Up_SqInk
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Product-Page_Standard-Icons_03_Start-Building_SqInk
Start building in the console

Get started building with Amazon DevOps Guru in the AWS Management Console.

Sign in