Amazon CloudWatch is a monitoring and management service that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. You can collect and access all your performance and operational data in the form of logs and metrics from a single platform rather than monitoring them in silos (server, network, or database). CloudWatch enables you to monitor your complete stack (applications, infrastructure, and services) and use alarms, logs, and events data to take automated actions and reduce mean time to resolution (MTTR). This frees up important resources and allows you to focus on building applications and business value.
CloudWatch gives you actionable insights that help you optimize application performance, manage resource utilization, and understand system-wide operational health. CloudWatch provides up to one-second visibility of metrics and logs data, 15 months of data retention (metrics), and the ability to perform calculations on metrics. This allows you to perform historical analysis for cost optimization and derive real-time insights into optimizing applications and infrastructure resources. You can use CloudWatch Container Insights to monitor, troubleshoot, and alert your containerized applications and microservices. CloudWatch collects, aggregates, and summarizes compute utilization information such as CPU, memory, disk, and network data, as well as diagnostic information such as container restart failures, to help DevOps engineers isolate issues and resolve them quickly. Container Insights gives you insights from container management services such as Amazon ECS for Kubernetes (EKS), Amazon Elastic Container Service (ECS), AWS Fargate, and standalone Kubernetes (k8s).
Easily collect and store logs
The Amazon CloudWatch Logs service allows you to collect and store logs from your resources, applications, and services in near real time. There are three main categories of logs:
1) Vended logs. These are natively published by AWS services on your behalf. Currently, Amazon VPC Flow Logs and Amazon Route 53 logs are the two supported types.
2) Logs published by AWS services. Currently, more than 30 AWS services publish logs to CloudWatch. They include Amazon API Gateway, AWS Lambda, AWS CloudTrail, and many others.
3) Custom logs. These are logs from your own application and on-premises resources.
You can use AWS Systems Manager to install a CloudWatch Agent, or you can use the PutLogData API action to easily publish logs.
Collect and aggregate infrastructure and application metrics
Amazon CloudWatch allows you to collect infrastructure metrics from more than 70 AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon ECS, AWS Lambda, and Amazon API Gateway, with no action on your part. For example, Amazon EC2 instances automatically publish CPU utilization, data transfer, and disk usage metrics to help you understand changes in state. You can use built-in metrics for API Gateway to detect latency, or use built-in metrics for AWS Lambda to detect errors or throttles. Likewise, Amazon CloudWatch also allows you to collect application metrics (such as user activity, error metrics or memory used) from your own applications to monitor operational performance, troubleshoot issues, and spot trends. You can use CloudWatch Agent or the PutMetricData API service call to publish these metrics to CloudWatch. If you need more detailed metrics beyond the default infrastructure metrics for example, such as shard-level Amazon Kinesis Data Streams metrics, you can simply opt in per resource. Similarly application metrics are available at up to one-second frequency and can be used in statistics, graphs, and alarms with high resolution.
Collect and aggregate container metrics and logs
Container Insights simplifies the collection and aggregation of curated metrics and container ecosystem logs. It collects compute performance metrics such as CPU, memory, network, and disk information from each container as performance events and automatically generates custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch Logs with metadata about the running environment, such as the Amazon EC2 instance ID, Service, and Amazon Elastic Block Store (Amazon EBS) volume mount and ID, to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. Container Insights also provides an option to collect application logs (stdout/stderr), custom logs, predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs, and Amazon EKS control plane logs. For Amazon EKS and k8s clusters, a preconfigured FluentD agent can be used to collect your logs. See the Container Insights logs setup documentation for more details. For Amazon ECS, the Amazon CloudWatch Logs logging driver or Fluent Bit can be used to collect application logs.
Collect and aggregate Lambda metrics and logs
CloudWatch Lambda Insights simplifies the collection and aggregation of curated metrics and logs from AWS Lambda functions. It collects compute performance metrics such as CPU, memory, and network from each Lambda function as performance events, while automatically generating custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch logs to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. See the Lambda Insights getting started documentation for more details.
Amazon CloudWatch Metric Streams enables you to create continuous, near-real-time streams of metrics to a destination of your choice. This makes it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream including the most up-to-date CloudWatch metrics data to power dashboards, alarms, and other tools that rely on accurate and timely metric data. Easily direct your metrics to your data lake on AWS (such as on Amazon S3) and start analyzing usage or performance with tools such as Amazon Athena.
Unified operational view with dashboards
Amazon CloudWatch dashboards enable you to create reusable graphs and visualize your cloud resources and applications in a unified view. You can graph metrics and logs data side by side in a single dashboard to quickly get the context and move from diagnosing the problem to understanding the root cause. For example, you can visualize key metrics, such as CPU utilization and memory, and compare them to capacity. You can also correlate the log pattern of a specific metric and set alarms to alert you to performance and operational issues. This gives you system-wide visibility into operational health and the ability to quickly troubleshoot issues, reducing MTTR.
With Amazon CloudWatch composite alarms, you can combine multiple alarms and reduce alarm noise. If an issue affects several resources in an application, you will receive a single alarm notification for the entire application instead of one for each affected resource. This helps you focus on finding the root cause of operational issues to reduce application downtime. You can provide an overall state for a grouping of resources, such as an application, AWS Region, or Availability Zone.
Amazon CloudWatch alarms allow you to set a threshold on metrics and trigger an action. You can create high-resolution alarms, set a percentile as the statistic, and either specify an action or ignore as appropriate. For example, you can create alarms on Amazon EC2 metrics, set notifications, and take one or more actions to detect and shut down unused or underutilized instances. Real-time alarming on metrics and events enables you to minimize downtime and potential business impact.
Logs and metrics correlation
Applications and infrastructure resources generate large amounts of operational and monitoring data in the form of logs and metrics. In addition to letting you access and visualize these datasets in a single platform, Amazon CloudWatch also makes it easy to correlate them. This helps you quickly move from diagnosing the problem to understanding the root cause. For example, you can correlate a log pattern, such as an error to a specific metric, and set alarms to alert you to performance and operational issues.
Amazon CloudWatch Application Insights provides automated setup of observability for your enterprise applications so you can get visibility into their health. It helps you identify and set up key metrics and logs across your application resources and technology stack, such as database, web (IIS) and application servers, operating system, load balancers, and queues. It constantly monitors this telemetry data to detect and correlate anomalies and errors to notify you of any problems in your application. To aid in troubleshooting, it creates automated dashboards for the detected problems with correlated metric anomalies and log errors, along with additional insights to point you to their potential root cause. This enables you to take quick remedial actions to ensure that your applications are healthy and end users are not impacted.
Container monitoring insights
Container Insights provides automatic dashboards in the CloudWatch console. These dashboards summarize the compute performance, errors, and alarms by cluster, pod/task, and service. For Amazon EKS and k8s, dashboards are also available for nodes/EC2 instances and namespaces. Each dashboard summarizes the list of running pods/tasks or containers by CPU and memory for the selected time window. You can dive deeper into application logs, AWS X-Ray traces, and performance events contextually, based on time window and selected pod/task or container.
Lambda monitoring insights
Lambda Insights provides automatic dashboards in the CloudWatch console. These dashboards summarize the compute performance and errors. Each dashboard includes the list of metrics for the selected time window and allows you to dive deeper contextually (based on time window and selected function) into application logs, AWS X-Ray traces, and performance events.
Amazon CloudWatch Anomaly Detection applies machine-learning (ML) algorithms to continuously analyze metric data and identify anomalous behavior. It allows you to create alarms that auto-adjust thresholds based on natural metric patterns, such as time of day, day of week, seasonality, or changing trends. You can also visualize metrics with anomaly detection bands on dashboards. This enables you to monitor, isolate, and troubleshoot unexpected changes in your metrics.
You can use Amazon CloudWatch ServiceLens to visualize and analyze the health, performance, and availability of your applications in a single place. It ties together CloudWatch metrics and logs as well as traces from AWS X-Ray to give you a complete view of your applications and their dependencies. Quickly pinpoint performance bottlenecks, isolate root causes of application issues, and determine the impact on users. CloudWatch ServiceLens lets you gain visibility into your applications in three main areas: infrastructure monitoring (using metrics and logs to understand the resources supporting your applications), transaction monitoring (using traces to understand dependencies between your resources), and end-user monitoring (using canaries to monitor your endpoints and notify you when the end-user experience has degraded). CloudWatch ServiceLens provides a Service Map that visualizes the contextual linking of all your resources, along with an intuitive interface so you can dive deep into correlated monitoring data.
Amazon CloudWatch Synthetics allows you to monitor application endpoints more easily. It runs tests on your endpoints 24/7 and alerts you if they don’t behave as expected. These tests can be customized to check for availability, latency, transactions, broken or dead links, step-by-step task completions, page load errors, load latencies for UI assets, complex wizard flows, or checkout flows in your applications. You can also use CloudWatch Synthetics to isolate alarming application endpoints and map them back to underlying infrastructure issues to reduce MTTR. With this new feature, CloudWatch now collects canary traffic, which can continually verify your customer experience even when there is no customer traffic on your applications, enabling you to discover issues before your customers do. CloudWatch Synthetics supports monitoring your REST APIs, URLs, and website content, checking for unauthorized changes from phishing, code injection, and cross-site scripting.
Auto Scaling helps you automate capacity and resource planning. You can set a threshold to alarm on a key metric and trigger an automated Auto Scaling action. For example, you could set up an Auto Scaling workflow to add or remove EC2 instances based on CPU utilization metrics and optimize resource costs.
Automate response to operational changes with CloudWatch Events
CloudWatch Events provides a near real-time stream of system events that describe changes to your AWS resources. It allows you to respond quickly to operational changes and take corrective action. You simply write rules to indicate which events are of interest to your application and what automated actions to take when a rule matches an event. You can, for example, set a rule to invoke AWS Lambda functions or notify an Amazon Simple Notification Service (Amazon SNS) topic.
Alarm and automate actions on EKS, ECS, and k8s clusters
For Amazon EKS and k8s clusters, Container Insights allows you to alarm on compute metrics to trigger auto scaling policies on your Amazon EC2 Auto Scaling group and gives you the ability to stop, terminate, reboot, and recover any Amazon EC2 instance. For Amazon ECS clusters, you can use compute metrics from your tasks and services for Service Auto Scaling.
Granular data and extended retention
Amazon CloudWatch allows you to monitor trends and seasonality with 15 months of metric data (storage and retention). This lets you perform historical analysis to fine-tune resource utilization. With CloudWatch, you can also collect up to one second of health metrics, including custom metrics (such as those coming from your on-premises applications). Granular real-time data enables better visualization and the ability to spot and monitor trends to optimize application performance and operational health.
Custom operations on metrics
Amazon CloudWatch Metric Math enables you to perform calculations across multiple metrics for real-time analysis so you can easily derive insights from your existing CloudWatch metrics and better understand the operational health and performance of your infrastructure. You can visualize these computed metrics in the AWS Management Console, add them to CloudWatch dashboards, or retrieve them using the GetMetricData API action. Metric Math supports arithmetic operations (such as +, -, /, and *) and mathematical functions (such as Sum, Average, Min, Max, and Standard Deviation).
Amazon CloudWatch Logs Insights enables you to drive actionable intelligence from your logs to address operational issues without needing to provision servers or manage software. You can instantly begin writing queries with aggregations, filters, and regular expressions. In addition, you can visualize time-series data, drill down into individual log events, and export query results to CloudWatch Dashboards. This gives you complete operational visibility. With a few clicks in the AWS Management Console, you can start using Logs Insights to query logs sent to CloudWatch. You pay only for the queries you run.
Analyze container metrics, logs, and traces
Container Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs (stdout/stderr), custom logs, predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs and Amazon EKS control plane logs using CloudWatch Logs Insights advanced query language.
Analyze Lambda metrics, logs, and traces
Lambda Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs, and custom logs, using the CloudWatch Logs Insights advanced query language.
Amazon CloudWatch now includes Contributor Insights, which analyzes time-series data to provide a view of the top contributors influencing system performance. Once set up, Contributor Insights runs continuously without additional user intervention. This helps developers and operators more quickly isolate, diagnose, and remediate issues during an operational event. Contributor Insights helps you understand who or what is impacting your system and application performance, such as a specific resource, customer account, or API call. This enables you to pinpoint outliers, find the heaviest traffic patterns, and rank the most-used system processes. You can create Contributor Insights rules to evaluate patterns in structured log events as they are sent to CloudWatch Logs, including logs from AWS services such as AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC), Amazon API Gateway, and any custom logs sent by your service or on-premises servers, such as Apache access logs. Contributor Insights evaluates these log events in near real time and display reports that show the top contributors and number of unique contributors in a dataset. A contributor is an aggregate metric based on dimensions contained as log fields in CloudWatch Logs, such as account-id or interface-id in VPC Flow Logs, or any other custom set of dimensions. You can sort and filter contributor data based on your own custom criteria. Contributor Insights report data can be displayed on CloudWatch dashboards, graphed alongside CloudWatch metrics, and added to CloudWatch alarms.
Amazon CloudWatch Metrics Insights is a fast, flexible, SQL-based query engine that enables you to identify trends and patterns within millions of operational metrics in near real time. Metrics Insights allows you to gain better visibility on your infrastructure and large-scale application performance with flexible querying and on-the-fly metric aggregations. Metrics Insights queries can be used to create powerful visualizations, helping you proactively monitor and pinpoint issues quickly, and reduce MTTR.
Amazon CloudWatch Evidently lets application developers conduct experiments and identify unintended consequences of new features before rolling them out for general use, thereby reducing risk related to new feature roll-out. Evidently allows you to validate new features across the full application stack before release, which makes for a safer release. When launching new features, you can expose them to a small user base, monitor key metrics such as page load times or conversions, and then dial up traffic. Evidently also allows you to try different designs, collect user data, and release the most effective design in production.
Compliance and Security
Amazon CloudWatch is integrated with AWS Identity and Access Management (IAM) so you can control which users and resources have permission to access your data and how they can access it.
Amazon CloudWatch Logs is also PCI and FedRamp compliant. Data is encrypted at rest and in transit. You can also use AWS Key Management Service (AWS KMS) encryption to encrypt your log groups for added compliance and security.