Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), IT managers, and product owners. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. You get a unified view of operational health and gain complete visibility of your AWS resources, applications, and services running on AWS and on-premises. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications running smoothly.
Use a single platform for observability
Modern applications, such as those running on microservices architectures, generate large volumes of data in the form of metrics, logs, and events. Amazon CloudWatch allows you to collect, access, and correlate this data on a single platform from across all your AWS resources, applications, and services running on AWS and on-premises, helping you break down data silos to gain system-wide visibility and quickly resolve issues.
Collect metrics on AWS and on premises
Monitoring your AWS resources and applications is easy with CloudWatch. It natively integrates with more than 70 AWS services, such as Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, Amazon EKS, and AWS Lambda. It automatically publishes detailed one-minute metrics and custom metrics with up to one-second granularity so you can dive deep into your logs for additional context. You can also use CloudWatch in hybrid environments by using the CloudWatch Agent or API to monitor your on-premises resources.
Improve operational performance and resource optimization
Set alarms and automate actions based on predefined thresholds or on machine learning (ML) algorithms that identify anomalous behavior in your metrics. For example, you can start Amazon EC2 Auto Scaling automatically or stop an instance to reduce billing overages. You can also use CloudWatch Events for serverless to trigger workflows with services like AWS Lambda, Amazon SNS, and AWS CloudFormation.
Get operational visibility and insight
To optimize performance and resource utilization, you need a unified operational view, real-time granular data, and historical reference. CloudWatch provides automatic dashboards, data with one-second granularity, and up to 15 months of metrics storage and retention. You can also perform metric math on your data to derive operational and utilization insights; for example, you can aggregate usage across an entire fleet of EC2 instances.
Derive actionable insights from logs
Explore, analyze, and visualize your logs so you can troubleshoot operational problems with ease. With CloudWatch Logs Insights, you pay only for the queries you run. It scales with your log volume and query complexity, giving you answers in seconds. In addition, you can publish log-based metrics, create alarms, and correlate logs and metrics together in CloudWatch Dashboards for complete operational visibility.
How it works
CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it using automated dashboards so you can get a unified view of your AWS resources, applications, and services that run on AWS and on premises. You can visualize the experience of your application end users and validate design choices through experimentation. Correlate your metrics and logs to better understand the health and performance of your resources. Create alarms based on metric value thresholds you specify, or alarms that can watch for anomalous metric behavior based on ML algorithms. For example, set up automated actions to notify you if an alarm is triggered and automatically start auto scaling to help reduce mean time to resolution (MTTR). You can also dive deep and analyze your metrics, logs, and traces to better understand how to improve application performance.
Monitor and troubleshoot infrastructure
Monitor key metrics and logs, visualize your application and infrastructure stack, create alarms, and correlate data to understand and resolve the root cause of performance issues in your AWS resources. This includes monitoring your container ecosystem across Amazon ECS, AWS Fargate, Amazon EKS, and Kubernetes.
Improve mean time to resolution
Correlate, visualize, and analyze metrics and logs so you can resolve issues quickly, and combine them with trace data from AWS X-Ray for full observability. You can also analyze user requests to speed up troubleshooting and debugging, and reduce overall MTTR.
Optimize resources proactively
CloudWatch alarms watch your metric values against thresholds that you specify or that it creates using ML models to detect anomalous behavior. If an alarm is triggered, CloudWatch can act automatically to enable Amazon EC2 Auto Scaling or stop an instance, so you can automate capacity and resource planning.
Monitor your end user’s digital experience and your applications that run on AWS (on Amazon EC2, containers, and serverless) and on-premises. CloudWatch collects data at every layer of the performance stack, from your front end to your infrastructure. You can use ServiceLens to identify performance bottlenecks in your applications and isolate them using the correlated metrics, logs, and traces. Add canaries for SLA/SLO monitoring of endpoints and UI workflows. Collect client-side data on application performance in near real time to identify and debug issues that impact end users. Experiment with features across the full application stack, measure against performance and business metrics, and launch features safely.
Use observability analytics
Analyze millions of operational logs and metrics in near real time to identify trends and patterns in your application performance, and use these insights to reduce MTTR. Use fast and interactive operational queries to create powerful visualizations, helping you monitor and pinpoint issues quickly.
“We use a microservices-based architecture. Amazon CloudWatch was an instant solution as it required no infrastructure setup or maintenance. CloudWatch has no issues handling our scale and removed the operational burden of integrating and managing multiple tools. The most important benefit for us is the decrease in MTTR (mean time to repair), as our DevOps team can quickly find issues across our container infrastructure.”
- Vitaliy Geraymovych, Co-founder and Vice President, Engineering, CloudPassage
Customers use Amazon CloudWatch to improve operational performance, optimize resource allocation, and reduce MTTR. To learn more about how organizations use Amazon CloudWatch, visit our customers page.
EBSCO uses Amazon CloudWatch Synthetics to simulate user journeys to monitor uptime availability of its applications, proactively identify issues, and easily debug them.
Mapbox uses Amazon CloudWatch to ingest multiple data sources and monitor key workloads.
Pushpay uses Amazon CloudWatch Logs Insights to query logs and reduce operational complexity.
Rackspace uses Amazon CloudWatch Agent to monitor their virtual machines.
SendGrid uses Amazon CloudWatch natively without needing a self-managed stack or third-party vendor.