Amazon DevOps Guru FAQs

General

Amazon DevOps Guru is a service powered by machine learning (ML) that is designed to make it easy to improve an application’s operational performance and availability. DevOps Guru helps detect behaviors that deviate from normal operating patterns so you can identify operational issues long before they impact your customers. DevOps Guru uses ML models informed by years of Amazon.com and AWS operational excellence to help identify anomalous application behavior (for example, increased latency, error rates, resource constraints, and others) and surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, and context for when and where the issue occurred. When possible, DevOps Guru is designed to also provide recommendations on how to remediate the issue.

Amazon DevOps Guru is designed to save you hours—if not days—of time and effort spent detecting, debugging, and resolving operational issues, enabling you to effectively monitor complex and evolving applications. It helps avoid common oversights and errors in monitoring, such as missing alarms, which cause application downtime. When operational issues occur, DevOps Guru saves debugging time by fetching relevant and specific information from a large number of data sources. DevOps Guru generates operational insights to alert you of the issue, with a summary of related anomalies, contextual information about why and when the issue occurred, along with recommendations on how to remediate issues and reduce application downtime.

Amazon DevOps Guru’s ML models benefit from more than 20 years of operational expertise in building, scaling, and maintaining universally available applications for Amazon.com. DevOps Guru is designed to automatically ingest and analyze metrics like latency, error rates, and request rates for all resources to establish normal operating bounds. DevOps Guru then uses a pre-trained ML model to identify deviations from the established baseline. When it identifies anomalous application behavior like increased latency, error rates, or resource constraints that could cause potential outages or service disruptions, it alerts operators with issue details like the resources involved, the issue timeline, and other related events to help them quickly understand the potential impact and likely causes of the issue. It is also designed to provide options for remediation or mitigation. Developers can then use those suggestions from DevOps Guru to reduce time to resolution when issues arise and improve application availability and reliability with no manual configuration setup—and with no ML expertise required. DevOps Guru can be used as a standalone service, and also integrates with partner applications from PagerDuty and Atlassian along with AWS System Manager OpsCenter.

With a few clicks, you can enable Amazon DevOps Guru in the AWS Management Console. DevOps Guru provides you with an onboarding wizard that helps you quickly configure the analysis coverage for your AWS resources. Once enabled, DevOps Guru is designed to continuously analyze the operational data for your AWS resources based on your selection and produces insights whenever it detects ongoing or emergent operational issues.

You can choose your analysis coverage boundary to be your entire AWS account or, you can prescribe the specific AWS CloudFormation stacks that you want DevOps Guru to analyze, or use AWS tags to create the resource grouping you want DevOps Guru to analyze. Based on your selection, DevOps Guru analyzes the operational data for all supported AWS resources in the chosen coverage boundary.

When you add new resources to your coverage boundary selection, DevOps Guru automatically starts analyzing the additional resources. Similarly, DevOps Guru stops analyzing and billing any resources when you remove them from your account or CloudFormation stack.

Amazon DevOps Guru is designed to automatically detect operational issues like missing or misconfigured alarms, early warning of resource exhaustion, and code and configuration changes that could lead to outages. DevOps Guru uses ML to correlate anomalies in metrics and logs with operational events and provides you with contextual insights to help you focus on the right remediation steps. DevOps Guru also correlates and groups related application and infrastructure metrics like web application latency spikes, running out of disk space, bad code deployments, or memory leaks to reduce false and redundant alarms so you can focus on high-severity issues.

At launch, Amazon DevOps Guru can use data from Amazon CloudWatch, AWS Config, AWS System Manager OpsCenter, AWS CloudFormation, and AWS X-Ray. Amazon DevOps Guru is also integrated with partner operations monitoring and incident management solutions like Atlassian OpsGenie and Pager Duty.

If you use AWS Systems Manager OpsCenter, Amazon DevOps Guru operational insights can be surfaced directly within the OpsCenter dashboard as OpsItems.

Amazon DevOps Guru uses encryption in transit and at rest to protect your content during ingestion and data analysis.

Our training data was generated by internal AWS services and infrastructure. 

Operational Insights

Amazon DevOps Guru operational insights aggregate the information needed to investigate and remediate an operational issue directly in the DevOps Guru console. An insight is composed of three main sections. It highlights the anomalous metrics and logs related to the operational issue, with graphs to easily visualize abnormal system and application behavior. The insight also includes contextual information such as relevant events and log snippets so you can easily understand the scope and issue timeline. Operational Insights also include recommendations on actions you can take to remediate the issue.

You can configure Amazon DevOps Guru to create an OpsItem in AWS Systems Manager OpsCenter for each insight that it generates. You also can configure DevOps Guru to deliver its insights via AWS SNS, which you can consume into incident management tools such PagerDuty and Atlassian.

Once enabled, Amazon DevOps starts baselining your application, which may range from minutes to an hour depending on the number of resources being analyzed. After baselining, DevOps Guru will analyze your resources continuously and produces insights when it detects anomalous behavior.

DevOps Guru for RDS

 Amazon DevOps Guru for RDS is an ML-powered capability in Amazon DevOps Guru that is designed to automatically detect and diagnose performance and operational issues within a database, enabling developers to resolve issues in minutes rather than days. DevOps Guru for RDS expands the capabilities of DevOps Guru to detect, diagnose, and remediate a wide variety of database-related issues in Amazon RDS (for example resource over-utilization, and misbehavior of certain SQL queries). When an issue occurs, Amazon DevOps Guru for RDS immediately notifies developers and provides diagnostic information, details on the extent of the problem, and intelligent remediation recommendations to help customers quickly resolve database-related performance bottlenecks and operational issues.

Amazon DevOps Guru for RDS is designed to remove manual effort and shortens time (from hours and days to minutes) to detect and resolve hard to find performance bottlenecks in your relational database workload. You can enable DevOps Guru for RDS for every Amazon Aurora and RDS for PostgreSQL database, and it will automatically detect performance issues for your workloads, send alerts to you on each issue, explain findings, and recommend actions to resolve. DevOps Guru for RDS helps make database administration more accessible to non-experts and assists database experts so that they can manage even more databases.

Amazon DevOps Guru for RDS analyzes telemetry data collected by Amazon RDS Performance Insights (PI). DevOps Guru for RDS does not use any of your data stored in the database in its analysis. DevOps Guru for RDS looks for problematic patterns in PI telemetry using a combination of rules and ML-based techniques, and alarms customers when such patterns are detected.

To get started, turn on Amazon RDS Performance Insights on the Amazon RDS console and navigate to the Amazon DevOps Guru console to enable the service for your Amazon Aurora resources, other supported resources, or your entire account. You can also turn on Amazon DevOps Guru for RDS for an Amazon Aurora database while creating or modifying a new database from within the Amazon RDS Console. Additionally, you also have an option to enable Amazon DevOps Guru for RDS from within the Performance Insights (PI) page or within the database details page. With DevOps Guru, you can choose your analysis coverage boundary to be your entire AWS account, prescribe the specific AWS CloudFormation stacks that you want DevOps Guru to analyze, or use AWS tags to create the resource grouping you want DevOps Guru to analyze.

Amazon DevOps Guru for RDS is designed to identify a wide range of performance issues that may affect application service quality, such as lock pile ups, connection storms, SQL regressions, CPU and I/O contention, memory issues, or misconfigured parameters.

Amazon RDS Performance Insights is a database performance tuning and monitoring feature that collects and presents a visual representation of the Amazon RDS database performance metrics, helping you quickly assess the health of your database, and determine when and where to take action. Amazon DevOps Guru for RDS monitors those metrics, detects when your database is experiencing performance issues, analyzes the metrics, and then tells you what’s wrong and what you can do about it.

DevOps Guru for Serveless

Amazon DevOps Guru for Serverless is a new ML-powered capability in Amazon DevOps Guru designed to automatically detect and diagnose performance and operational issues for Serverless Applications built using AWS resources. DevOps Guru for Serverless expands the capabilities of DevOps Guru to detect, diagnose, and recommend remediation for serverless applications (for example performance latency degradation, resource exhaustion etc). It provides Reactive Insights for on-going issues impacting the application to allow you to resolve them more quickly. In addition, it provides Proactive Insights to flag potential issues with your applications and infrastructure early, enabling you to respond faster and reduce downtime and reduce operational costs.

Amazon DevOps Guru for Serverless allows you to monitor your serverless applications for performance and operational issues. There is no manual setup, ML expertise, or deep serverless expertise required. The service is designed to shorten time (from hours to minutes) to detect and resolve hard to find reliability, performance and operational issues for your serverless applications. DevOps Guru for Serverless also detects potential issues that may impact your application early enabling you to mitigate the issue before it impacts users.

Amazon DevOps Guru for Serverless automatically ingests and analyzes metrics and logs for all resources of the serverless application to establish normal operating bounds and then detects deviations from the established baseline. When DevOps Guru identifies that the application is in an anomalous state, it alerts operators of the issue with relevant details like resources involved, the issue timeline, and related events to help them quickly understand the potential impact and likely causes of the issue. It is also designed to provide options for remediation or mitigation.

With a few clicks, you can get started on monitoring your serverless applications by enabling Amazon DevOps Guru on the AWS Account for your serverless application. You can set the coverage boundary to be your entire AWS account or, you can prescribe the specific AWS CloudFormation stacks, or use AWS tags to create the resource grouping you want DevOps Guru to analyze.

DevOps Guru for Serverless uses ML to correlate anomalies in metrics and logs with operational events and provides you with contextual insights to help you focus on the right remediation steps. In addition, DevOps Guru for Serverless detects potential issues early so you can mitigate them before they impact your applications. There are three types of proactive insights:

  • Resources setup: Amazon DevOps Guru for Serverless detects that the application has resources setup that do not follow AWS best practices. For example, consider a Lambda based application with an API Gateway endpoint. The Lambda function has invocations beyond the currently provisioned function concurrency. This leads to continuous spillover of the requests causing cold starts, consequently a degraded latency and potentially higher costs. DevOps Guru detects this issue and proactively recommends increasing Lambda function provisioned concurrency.
  • Resource exhaustion: Amazon DevOps Guru for Serverless detects that based on the application usage trends there is a risk of some resources reaching their limit. For example, an Elastic Search node has a slow memory leak which has been steadily growing. DevOps Guru detects this and predicts that memory shall hit max capacity limits soon and so it creates a Proactive Insight recommending fixing the memory build up.
  • Resources utilization: Amazon DevOps Guru for Serverless detects that the application has resources that are underutilized. For example, the DynamoDB for an application has provisioned write capacity units that is significantly over what is actually consumed. DevOps Guru detects this and recommends scaling back the DynamoDB’s provisioned write capacity.

Amazon DevOps Guru for Serverless provides Reactive Insights for on-going issues – latency degradation, 5xx errors etc. – impacting the application to allow you to resolve them quickly. Amazon DevOps Guru for Serverless provides Proactive Insights to flag potential issues with your applications and infrastructure early, enabling you to respond quickly and help reduce costly downtime or operating costs.

Pricing and Billing

With Amazon DevOps Guru, you only pay for what you use. There is no up-front commitment or minimum fee. After you enable DevOps Guru and specify the applications you want to monitor, DevOps Guru starts analyzing the operational data for the resources that these applications use. There are two components that determine your bill: charges for AWS resource analysis, and charges for DevOps Guru API calls. For more details, please refer to our pricing page.

The AWS resource types (Amazon S3 bucket, Amazon EC2 instance) analyzed by DevOps Guru are categorized into two pricing groups. The rate you’re charged for a specific AWS resource type depends on the price group: A or B.

No—you pay for the number of AWS resource hours analyzed, for each active resource. A resource is only active if it produces metrics, events, or log entries within an hour.

DevOps Guru analyzes more than 25 different AWS resource types (Amazon S3 Bucket, Amazon EC2 Instance), with support for additional resource types coming soon.

Instead of choosing specific AWS resources for analysis, you specify the resource analysis coverage boundary. Based on your selection, DevOps Guru will analyze the operational data for all supported AWS resources in your coverage boundary. You can either choose the entire account, specific AWS CloudFormation stacks, or use AWS tags to create the resource grouping you want DevOps Guru to analyze as your coverage boundary. When you add new resources to your coverage boundary (account or CloudFormation stack), DevOps Guru automatically starts analyzing the additional resources. Similarly, DevOps Guru stops analyzing and billing any resources that you remove from your account or CloudFormation stack that DevOps Guru is analyzing.

Amazon DevOps Guru for RDS is offered to customers at no additional charge, as part of the existing price that DevOps Guru charges customers for RDS resources. DevOps Guru segments the resource types it evaluates into two groups. Group A includes AWS Lambda and Amazon S3, and Group B includes Amazon RDS, Amazon EC2, Amazon Redshift clusters, and 25 other AWS resource types. Group A is priced at $0.0028 per resource per hour (equates to approximately $2 per resource for 30 days). Group B is priced at $0.0042 per resource per hour (equates to approximately $3 per resource for 30 days). For more details, please refer to our pricing page.

You can use the DevOps Guru cost estimator tool to determine resource analysis charges. Your selected resources are scanned to create a monthly cost estimate. The cost estimator's default is to assume that the analyzed active resources are utilized 100 percent of the time. You can change this percentage for each analyzed service based on your estimated usage to create an updated monthly cost estimate.

If you configure Amazon Simple Notification Service (SNS) to receive notifications about DevOps Guru events, you will incur additional charges per standard Amazon SNS pricing. Similarly, if you configure to receive an OpsItem for DevOps Guru insights, you incur additional charges per standard AWS Systems Manager pricing.

Yes, AWS Free Tier includes DevOps Guru analysis of 7,200 AWS resource hours each for resource group A and B and usage of 10,000 DevOps Guru API calls per month for three months.

Amazon DevOps Guru is available in the following AWS regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm), Europe (London), Europe (Paris), Asia Pacific (Mumbai), Asia Pacific (Seoul), South America (São Paulo), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo), with additional regions coming soon. You can also refer to the AWS Regional Services List.