- Amazon CloudWatch›
- Features›
- AIOps
AI Operations
Leverage AI to quickly identify, diagnose, and remediate operational issues
Overview
Leverage the extensive operational experience that AWS has accumulated and refined over 19 years of delivering cloud services to millions of customers globally. We've applied AI and machine learning (ML) to help enhance, accelerate, and automate your cloud operations processes. AIOps allows you to easily observe your workloads, accelerate operational troubleshooting, and take actions to resolve and remediate operational issues, improving mean time to recovery (MTTR). 
Find root cause of issues in a fraction of the time
Start an operational investigation from anywhere in the AWS Management Console. You can configure Amazon CloudWatch to begin an investigation as soon as an alarm goes off, or create an investigation from an Amazon Q chat. CloudWatch works alongside you in the investigation, helping you identify anomalies in your applications and drive hypotheses into the root cause of issues.
 
 
                Quickly resolve issues using remediation suggestions
Amazon CloudWatch suggests remediation actions for common AWS issues by surfacing relevant AWS Systems Manager Automation runbooks, AWS re:Post articles, and documentation. Run the runbook to resolve the issue so you can get your business-critical applications back to fully operational quickly.
Read about Amazon CloudWatch investigations in documentation
 
 
                Empower operators of all experience levels
Amazon CloudWatch takes on the heavy lifting of the troubleshooting process so you don’t have to be an expert on all of your application resources. During an operational investigation, CloudWatch sifts through hundreds of thousands of data points to discover relationships between services and develop an understanding of how they work together. After analyzing its findings, CloudWatch presents you with potential hypotheses for the root cause of the issue and guides you through how to resolve it.
 
 
                Automatically detect anomalies and patterns
Amazon CloudWatch uses advanced machine learning (ML) to automatically set baselines and detect anomalies in your telemetry data, removing the need to manually sift through your metrics and logs. Get alerts on spikes or unusual patterns to address issues before they escalate. CloudWatch highlights recurring patterns and key values such as severity levels, helping you to quickly zero in on relevant logs or compare behavior over time to spot problems faster.
.e63ca9fb7c6ffd97763dc64ac85777f0b4d0fb20.png) 
 
                Query telemetry data using natural language
Extract insights from your telemetry without needing to learn complex query languages. Instead of writing complex queries, you can simply ask questions in plain English, such as “Show me the 10 slowest AWS Lambda requests in the last 24 hours,” and Amazon CloudWatch will generate the correct syntax automatically. Using the natural language summarization capability in CloudWatch Logs Insights, you can generate summaries from your query results to help you quickly identify issues and gain actionable insights from your log data.
Read how you can use natural language queries and natural language summarization in documentation
 
 
                Featured Services and Solutions
Customers
Cedar Gate Technologies
Healthcare technology provider Cedar Gate Technologies can now identify the root cause of operational issues in about 30 minutes, compared to two hours, by using Amazon CloudWatch to accelerate investigations and swiftly resolve issues so that clients have continuity providing valuable care to their patients.
 
 
                      Amazon Kindle
Amazon Kindle support engineers have seen 65-80% faster issue resolution while using Amazon CloudWatch for investigations, helping them more quickly address the needs of customers to provide the best user experience. 
 
 
 
                      Amazon Music
Amazon Music developers are using Amazon CloudWatch as a 24/7 assistant to automate investigations and identify trends across issues, helping them move faster during their on-call shifts. Early usage shows that Amazon Music is resolving issues twice as fast, so that listeners can continue to enjoy their favorite songs.
 
 
 
                      SmugMug
Photo-management platform SmugMug will use Amazon CloudWatch to automatically analyze metrics, logs, and operational events across their systems, enabling them to diagnose most issues in under 20 minutes and up to 50% faster. This improves operational efficiency by reducing manual log searches, so their team can spend less time and resources managing issues and more time building the platform to help photographers grow their digital storefronts.
