Amazon DevOps Guru
ML-powered cloud operations service to improve application availability
Amazon DevOps Guru is a Machine Learning (ML) powered service that makes it easy to improve an application’s operational performance and availability. DevOps Guru detects behaviors that deviate from normal operating patterns so you can identify operational issues long before they impact your customers.
DevOps Guru uses machine learning models informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (e.g. increased latency, error rates, resource constraints, etc.) and surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, the likely root cause, and context about when and where the issue occurred. When possible DevOps Guru, also provides recommendations on how to remediate the issue.
DevOps Guru automatically ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data. You can get started with DevOps Guru to improve application availability and reliability with no manual setup or machine learning expertise.
Automatically detect operational issues
Using machine learning, Amazon DevOps Guru automatically collects and analyzes data such as application metrics, logs, and events and identifying behaviors that deviate from normal operating patterns. It automatically detects and alerts on operational issues and risks, such as impending resource exhaustion, code and configuration changes that may cause outages, memory leaks, under-provisioned compute capacity, and database I/O overutilization.
Resolve issues quickly with ML-powered insights
Amazon DevOps Guru helps reduce the time to identify and resolve the root cause of issues by by correlating anomalous behavior and operational events. When an issue occurs, DevOps Guru generates insights with a summary of related anomalies, contextual information about the issue and, when possible, it provides actionable recommendations for remediation.
Easily scale and maintain availability
Amazon DevOps Guru saves you the time and effort involved in manually updating static rules and alarms so you can effectively monitor complex and evolving applications. When you migrate or adopt new AWS services, DevOps Guru automatically analyzes their metrics, logs, and events. Then it produces insights, helping you easily adapt to changing behavior and evolving system architecture.
Reduce noise and alarm fatigue
AmazonDevOps Guru helps Developers and IT operators reduce alarm noise and overcome alarm fatigue by using pre-trained machine learning models to correlate and group related anomalies and surface the most critical alerts. With DevOps Guru, you can reduce the need to manage multiple monitoring tools and alarms, which means you can focus on the root cause of the issue and remediation.
How it works
You can use Amazon DevOps Guru to get a quick summary of all the operationally significant events that have been, identified, sorted by their severity. Using the System Health Dashboard you can search for issues in specific applications, identify trends, and decide where developers should spend their time and resources.
Proactive resource exhaustion planning
Build predictive alarming for exhaustible resources such as memory, CPU, and disk space. Amazon DevOps Guru forecasts when resource utilization will exceed the provisioned capacity, and informs you by creating a notification in the dashboard, helping you avoid an impending outage.
With Amazon DevOps Guru you can prevent incidents before they occur. DevOps Guru flags medium and low-severity findings that might not be critical, but if left alone worsen over time and affect the availability of your application. This helps you prioritize, and avoid unforeseen downtime. For example, DevOps Guru notifies you about hitting the limits of your auto scaling groups, changes in latency patterns, or increased API call volume. DevOps Guru also identifies AWS best practices to help you increase the overall availability of your application.
“We run thousands of EC2 instances and I am always looking for ways to reduce the time my team spends on resolving operational issues. We are excited to use Amazon DevOps Guru and leverage its ML-powered insights to help us identify, correlate and remediate operational issues. This will help my team save hours and reduce our mean time to recovery (MTTR).”
- Valentino Volonghi
"My team follows an ops-for-life motto, and we are always on the lookout for ways to automate our manual activities. With Amazon DevOps Guru, we hope to realize that goal and let AIOps take over many of our day-to-day tasks, so my team can focus on IT innovation. We are now not only meeting the needs of the business but able to exceed them since we have more time to focus on what matters most – delivering value for our organization and our customers."
- Andrew Shieh
SmugMug’s Operations Director
“Customer experience is vital to us. Dealing with multiple sources of alerts for availability, performance, and change requests can be a challenge when trying to prevent and mitigate incidents impacting our customers. We are excited to use Amazon DevOps Guru and leverage its ML-powered insights to provide clear paths for action. This allows us to mitigate issues quickly and avoid events that impact customers. The integration with PagerDuty is a bonus, as we can have recommendations delivered to the right people timely and efficiently.”
- Steve Thoennes
Director Infrastructure Hosting Portfolio
"Atlassian is proud to support Amazon on the launch of DevOps Guru and help empower teams to deploy code and operate services with confidence. With our new Opsgenie and Jira Service Management integration, the right teams can be immediately notified the instant DevOps Guru predicts a potential issue, or determines an incident has occurred. DevOps Guru provides a new dimension of insight, and Atlassian ensures the fastest response."
- Emel Dogrusoz
Head of Product, Opsgenie
"PagerDuty was built to drive the move to a DevOps culture by automating the entire incident response lifecycle with resolution. We’re excited to continue this commitment to DevOps with our latest integration with Amazon DevOps Guru. Leveraging Amazon’s decades of operational excellence and DevOps Guru’s machine learning capabilities, PagerDuty provides even more real-time signal-to-action capabilities to our joint customers. Through PagerDuty’s ingestion of DevOps Guru’s Amazon Simple Notification Service (SNS) notifications, AWS customers can take real-time action on operational issues before they become customer-impacting outages.”
- Jonathan Rende
SVP of Product
New- Amazon DevOps Guru Helps Identify Application Errors and Fixes
Easily configure Amazon DevOps Guru across multiple accounts and Regions using AWS CloudFormation StackSets
Nikunj Vaidya & Nuatu Tseggai
AWS re:Invent 2020: Improve application availability w ML-powered insights using Amazon DevOps Guru
Automate code reviews
Catch code problems faster and earlier with Amazon CodeGuru
Easily build sophisticated personalization capabilities
into your applications
Instantly get access to the AWS Free Tier.
Get started building with Amazon DevOps Guru in the AWS Management Console.