Skip to main content

Amazon CloudWatch

AI Operations

Leverage AI to quickly identify, diagnose, and remediate operational issues

Overview

Leverage the extensive operational experience that AWS has accumulated and refined over 19 years of delivering cloud services to millions of customers globally. We've applied AI and machine learning (ML) to help enhance, accelerate, and automate your cloud operations processes. AIOps allows you to easily observe your workloads, accelerate operational troubleshooting, and take actions to resolve and remediate operational issues, improving mean time to recovery (MTTR). 

Find root cause of issues in a fraction of the time

Start an operational investigation from anywhere in the AWS Management Console. You can configure Amazon CloudWatch to begin an investigation as soon as an alarm goes off, or create an investigation from an Amazon Q chat. CloudWatch works alongside you in the investigation, helping you identify anomalies in your applications and drive hypotheses into the root cause of issues.

Start CloudWatch investigations interactive demo

Missing alt text value

Quickly resolve issues using remediation suggestions

Amazon CloudWatch suggests remediation actions for common AWS issues by surfacing relevant AWS Systems Manager Automation runbooks, AWS re:Post articles, and documentation. Run the runbook to resolve the issue so you can get your business-critical applications back to fully operational quickly.

Read about Amazon CloudWatch investigations in documentation

Missing alt text value

Empower operators of all experience levels

Amazon CloudWatch takes on the heavy lifting of the troubleshooting process so you don’t have to be an expert on all of your application resources. During an operational investigation, CloudWatch sifts through hundreds of thousands of data points to discover relationships between services and develop an understanding of how they work together. After analyzing its findings, CloudWatch presents you with potential hypotheses for the root cause of the issue and guides you through how to resolve it.

View a sample investigation

Missing alt text value

Automatically detect anomalies and patterns

Amazon CloudWatch uses advanced machine learning (ML) to automatically set baselines and detect anomalies in your telemetry data, removing the need to manually sift through your metrics and logs. Get alerts on spikes or unusual patterns to address issues before they escalate. CloudWatch highlights recurring patterns and key values such as severity levels, helping you to quickly zero in on relevant logs or compare behavior over time to spot problems faster.

Read about CloudWatch anomaly detection in documentation

Missing alt text value

Query telemetry data using natural language

Extract insights from your telemetry without needing to learn complex query languages. Instead of writing complex queries, you can simply ask questions in plain English, such as “Show me the 10 slowest AWS Lambda requests in the last 24 hours,” and Amazon CloudWatch will generate the correct syntax automatically. Using the natural language summarization capability in CloudWatch Logs Insights, you can generate summaries from your query results to help you quickly identify issues and gain actionable insights from your log data.

Read how you can use natural language queries and natural language summarization in documentation

Missing alt text value

Customers

Cedar Gate Technologies

Healthcare technology provider Cedar Gate Technologies can now identify the root cause of operational issues in about 30 minutes, compared to two hours, by using Amazon CloudWatch to accelerate investigations and swiftly resolve issues so that clients have continuity providing valuable care to their patients.

Missing alt text value

Amazon Kindle

Amazon Kindle support engineers have seen 65-80% faster issue resolution while using Amazon CloudWatch for investigations, helping them more quickly address the needs of customers to provide the best user experience. 

Missing alt text value

Amazon Music

Amazon Music developers are using Amazon CloudWatch as a 24/7 assistant to automate investigations and identify trends across issues, helping them move faster during their on-call shifts. Early usage shows that Amazon Music is resolving issues twice as fast, so that listeners can continue to enjoy their favorite songs.

Missing alt text value

SmugMug

Photo-management platform SmugMug will use Amazon CloudWatch to automatically analyze metrics, logs, and operational events across their systems, enabling them to diagnose most issues in under 20 minutes and up to 50% faster. This improves operational efficiency by reducing manual log searches, so their team can spend less time and resources managing issues and more time building the platform to help photographers grow their digital storefronts.

Missing alt text value