Amazon CloudWatch Features

Overview

Amazon CloudWatch is a monitoring and management service that provides data and actionable insights for AWS, on-premises, hybrid, and other cloud applications and infrastructure resources. You can collect and access all your performance and operational data in the form of logs and metrics from a single platform rather than monitoring them in silos (server, network, or database). CloudWatch enables you to monitor your complete stack (applications, infrastructure, network, and services) and use alarms, logs, and events data to take automated actions and reduce mean time to resolution (MTTR). This frees up important resources and allows you to focus on building applications and business value.

CloudWatch gives you actionable insights that help you optimize application performance, manage resource utilization, and understand system-wide operational health. CloudWatch provides up to one-second visibility of metrics and logs data, 15 months of data retention (metrics), and the ability to perform calculations on metrics. This allows you to perform historical analysis for cost optimization and derive real-time insights into optimizing applications and infrastructure resources. You can use CloudWatch Container Insights to monitor, troubleshoot, and alert your containerized applications and microservices. CloudWatch collects, aggregates, and summarizes compute utilization information such as CPU, memory, disk, and network data, as well as diagnostic information such as container restart failures, to help DevOps engineers isolate issues and resolve them quickly. Container Insights gives you insights from container management services such as Amazon ECS for Kubernetes (EKS), Amazon Elastic Container Service (ECS), AWS Fargate, and standalone Kubernetes (k8s). 

Collect

There are two log classes:
 

  1. Amazon CloudWatch Logs Infrequent Access (Logs-IA) is purpose-built for consolidating all your logs natively on AWS. It offers the managed ingestion, cross-account log analytics, and encryption of CloudWatch Logs Standard, with a low per GB ingestion price. This combination of tailored capabilities and low cost make CloudWatch Logs-IA ideal for ad-hoc querying and after-the-fact forensic analysis.
  2. Amazon CloudWatch Logs Standard for comprehensive log management intended for real-time monitoring and advanced analytics capabilities like Live Tail, metric extraction, alarming or data protection.

The Amazon CloudWatch Logs service allows you to collect and store logs from your resources, applications, and services in near real time. There are three main categories of logs:

1) Vended logs. These are natively published by AWS services on your behalf. Currently, Amazon VPC Flow Logs and Amazon Route 53 logs are the two supported types.

2) Logs published by AWS services. Currently, more than 30 AWS services publish logs to CloudWatch. They include Amazon API Gateway, AWS Lambda, AWS CloudTrail, and many others.

3) Custom logs. These are logs from your own application and on-premises resources, and from other clouds.

You can use AWS Systems Manager to install a CloudWatch Agent, or you can use the PutLogData API action to easily publish logs.

Amazon CloudWatch allows you to collect infrastructure metrics from more than 70 AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon ECS, AWS Lambda, and Amazon API Gateway, with no action on your part. For example, Amazon EC2 instances automatically publish CPU utilization, data transfer, and disk usage metrics to help you understand changes in state. You can use built-in metrics for API Gateway to detect latency, or use built-in metrics for AWS Lambda to detect errors or throttles. Likewise, Amazon CloudWatch also allows you to collect application metrics (such as user activity, error metrics or memory used) from your own applications to monitor operational performance, troubleshoot issues, and spot trends. You can use CloudWatch Agent or the PutMetricData API service call to publish these metrics to CloudWatch. If you need more detailed metrics beyond the default infrastructure metrics for example, such as shard-level Amazon Kinesis Data Streams metrics, you can simply opt in per resource. Similarly application metrics are available at up to one-second frequency and can be used in statistics, graphs, and alarms with high resolution.

Container Insights simplifies the collection and aggregation of curated metrics and container ecosystem logs. It collects compute performance metrics such as CPU, memory, network, and disk information from each container as performance events and automatically generates custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch Logs with metadata about the running environment, such as the Amazon EC2 instance ID, Service, and Amazon Elastic Block Store (Amazon EBS) volume mount and ID, to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. Container Insights also provides an option to collect application logs (stdout/stderr), custom logs, predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs, and Amazon EKS control plane logs. For Amazon EKS and k8s clusters, a preconfigured FluentD agent can be used to collect your logs. See the Container Insights logs setup documentation for more details. For Amazon ECS, the Amazon CloudWatch Logs logging driver or Fluent Bit can be used to collect application logs.

CloudWatch Lambda Insights simplifies the collection and aggregation of curated metrics and logs from AWS Lambda functions. It collects compute performance metrics such as CPU, memory, and network from each Lambda function as performance events, while automatically generating custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch logs to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. See the Lambda Insights getting started documentation for more details.

Amazon CloudWatch Metric Streams enables you to create continuous, near-real-time streams of metrics to a destination of your choice. This makes it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream including the most up-to-date CloudWatch metrics data to power dashboards, alarms, and other tools that rely on accurate and timely metric data. Easily direct your metrics to your data lake on AWS (such as on Amazon S3) and start analyzing usage or performance with tools such as Amazon Athena.

Monitor

Cross-account observability in CloudWatch helps you monitor and troubleshoot applications that span multiple accounts within a Region. You can search for log groups stored across multiple accounts from a central view, run cross-account Logs Insights queries and create Contributor Insights rules across accounts to identify top-N contributors generating log entries. You can also visualize metrics from many accounts in a consolidated view, and create alarms that evaluate metrics from other accounts to be notified of anomalies and trending issues. With cross-account observability in CloudWatch, you can view an interactive map of your cross-account applications using ServiceLens with one-step drill downs to relevant metrics, logs, and traces. Cross-account observability in CloudWatch delivers a holistic operational view in just a few steps without requiring additional data pipelines—saving you time, effort, and cost managing your infrastructure and applications.

Amazon CloudWatch dashboards enable you to create reusable graphs and visualize your cloud resources and applications in a unified view. You can graph metrics and logs data side by side in a single dashboard to quickly get the context and move from diagnosing the problem to understanding the root cause. For example, you can visualize key metrics, such as CPU utilization and memory, and compare them to capacity. You can also correlate the log pattern of a specific metric and set alarms to alert you to performance and operational issues. This gives you system-wide visibility into operational health and the ability to quickly troubleshoot issues, reducing MTTR.

Amazon CloudWatch supports querying from multiple data sources to help you monitor and troubleshoot hybrid and multi cloud workloads so you can detect and resolve issues faster. You can query and combine metrics from sources such as Amazon OpenSearch, Prometheus, Azure Monitor, and your own custom data sources, and query those metrics in real time, increasing visibility into your application health and helping you resolve critical events faster. Amazon CloudWatch multi data source querying allows you to set up your own data source using an AWS Lambda function.

With Amazon CloudWatch composite alarms, you can combine multiple alarms and reduce alarm noise. If an issue affects several resources in an application, you will receive a single alarm notification for the entire application instead of one for each affected resource. This helps you focus on finding the root cause of operational issues to reduce application downtime. You can provide an overall state for a grouping of resources, such as an application, AWS Region, or Availability Zone.

Amazon CloudWatch alarms allow you to set a threshold on metrics and trigger an action. You can create high-resolution alarms, set a percentile as the statistic, and either specify an action or ignore as appropriate. For example, you can create alarms on Amazon EC2 metrics, set notifications, and take one or more actions to detect and shut down unused or underutilized instances. Real-time alarming on metrics and events enables you to minimize downtime and potential business impact.

Applications and infrastructure resources generate large amounts of operational and monitoring data in the form of logs and metrics. In addition to letting you access and visualize these datasets in a single platform, Amazon CloudWatch also makes it easy to correlate them. This helps you quickly move from diagnosing the problem to understanding the root cause. For example, you can correlate a log pattern, such as an error to a specific metric, and set alarms to alert you to performance and operational issues.

Amazon CloudWatch Application Insights provides automated setup of observability for your enterprise applications so you can get visibility into their health. It helps you identify and set up key metrics and logs across your application resources and technology stack, such as database, web (IIS) and application servers, operating system, load balancers, and queues. It constantly monitors this telemetry data to detect and correlate anomalies and errors to notify you of any problems in your application. To aid in troubleshooting, it creates automated dashboards for the detected problems with correlated metric anomalies and log errors, along with additional insights to point you to their potential root cause. This enables you to take quick remedial actions to ensure that your applications are healthy and end users are not impacted.

Container Insights with enhanced observability for EKS

Container Insights now delivers detailed EKS metrics such as container-level performance metrics, Kube-state metrics and EKS control plane metrics out-of-the box allowing you to visually drill down and up across various container layers to easily spot issues such as memory leaks in individual containers. Container Insights now shows you a list of container layers consuming high levels of resources, so that you can identify risks in your environment, even if you have not yet set up alarms, and take proactive action before your end user experience is impacted. Container Insights with enhanced observability for Amazon EKS comes with an easy getting started experience where you can auto-instrument your clusters with CloudWatch Observability add-on for EKS on your cluster details console and start ingesting telemetry immediately.

Container Insights without enhanced observability

CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices running on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Kubernetes platforms on Amazon EC2, and AWS Fargate (for both Amazon ECS and Amazon EKS). Container Insights collects container metrics such as CPU, memory, disk, and network metrics out of the box and provides deeper diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. Container Insights delivers your container observability in automatic dashboards enabling you to monitor your application health and performance easily. You can also set CloudWatch alarms on Container Insights metrics to be notified of anomalies before your application performance is impacted.

 

Internet Monitor provides visibility into how internet issues impact the performance and availability between your AWS-hosted applications and your end users, reducing the time it takes for you to diagnose these issues from days to minutes. You can explore measurements for different timeframes and at different geographic granularities, quickly visualize the impact of issues, and then take action to improve your end users' experience (for example, by switching to other AWS services or rerouting traffic to your workload through different AWS Regions). If the issue is caused by the AWS network, you'll automatically receive an AWS Health Dashboard notification that tells you the steps that AWS is taking to mitigate the problem. Internet Monitor delivers measurements to CloudWatch metrics and CloudWatch Logs, to easily support integrating health information for geographies and networks specific to your application. Internet Monitor also sends health events to Amazon EventBridge, so you can set up notifications. Internet Monitor monitors your application through Amazon Virtual Private Clouds (VPCs), Amazon CloudFront distributions, and Amazon WorkSpaces directories.

Lambda Insights provides automatic dashboards in the CloudWatch console. These dashboards summarize the compute performance and errors. Each dashboard includes the list of metrics for the selected time window and allows you to dive deeper contextually (based on time window and selected function) into application logs, AWS X-Ray traces, and performance events.

Amazon CloudWatch Anomaly Detection applies machine-learning (ML) algorithms to continuously analyze metric data and identify anomalous behavior. It allows you to create alarms that auto-adjust thresholds based on natural metric patterns, such as time of day, day of week, seasonality, or changing trends. You can also visualize metrics with anomaly detection bands on dashboards. This enables you to monitor, isolate, and troubleshoot unexpected changes in your metrics.

You can use Amazon CloudWatch ServiceLens to visualize and analyze the health, performance, and availability of your applications in a single place. It ties together CloudWatch metrics and logs as well as traces from AWS X-Ray to give you a complete view of your applications and their dependencies. Quickly pinpoint performance bottlenecks, isolate root causes of application issues, and determine the impact on users. CloudWatch ServiceLens lets you gain visibility into your applications in three main areas: infrastructure monitoring (using metrics and logs to understand the resources supporting your applications), transaction monitoring (using traces to understand dependencies between your resources), and end-user monitoring (using canaries to monitor your endpoints and notify you when the end-user experience has degraded). CloudWatch ServiceLens provides a Service Map that visualizes the contextual linking of all your resources, along with an intuitive interface so you can dive deep into correlated monitoring data.

Amazon CloudWatch Synthetics allows you to monitor application endpoints more easily. It runs tests on your endpoints 24/7 and alerts you if they don’t behave as expected. These tests can be customized to check for availability, latency, transactions, broken or dead links, step-by-step task completions, page load errors, load latencies for UI assets, complex wizard flows, or checkout flows in your applications. You can also use CloudWatch Synthetics to isolate alarming application endpoints and map them back to underlying infrastructure issues to reduce MTTR. With this new feature, CloudWatch now collects canary traffic, which can continually verify your customer experience even when there is no customer traffic on your applications, enabling you to discover issues before your customers do. CloudWatch Synthetics supports monitoring your REST APIs, URLs, and website content, checking for unauthorized changes from phishing, code injection, and cross-site scripting.

Amazon CloudWatch RUM gives you visibility into your applications’ client-side performance and reduces MTTR. It allows you to collect client-side data on web application performance in near real time to identify and debug issues. CloudWatch RUM complements the CloudWatch Synthetics data to give you more visibility into your end-user experience. You can visualize anomalies in performance and use the relevant debugging data (such as error messages, stack traces, and user sessions) to fix performance issues (such as JavaScript errors, crashes, and latencies). You can gain insight into the range of end-user impacts, including number of users, geolocations, and browsers. CloudWatch RUM aggregates data on your users' journey through your application, which can help you determine which features to launch and bug fixes to prioritize.

Act

Auto Scaling helps you automate capacity and resource planning. You can set a threshold to alarm on a key metric and trigger an automated Auto Scaling action. For example, you could set up an Auto Scaling workflow to add or remove EC2 instances based on CPU utilization metrics and optimize resource costs.

CloudWatch Events provides a near real-time stream of system events that describe changes to your AWS resources. It allows you to respond quickly to operational changes and take corrective action. You simply write rules to indicate which events are of interest to your application and what automated actions to take when a rule matches an event. You can, for example, set a rule to invoke AWS Lambda functions or notify an Amazon Simple Notification Service (Amazon SNS) topic.

For Amazon EKS and k8s clusters, Container Insights allows you to alarm on compute metrics to trigger auto scaling policies on your Amazon EC2 Auto Scaling group and gives you the ability to stop, terminate, reboot, and recover any Amazon EC2 instance. For Amazon ECS clusters, you can use compute metrics from your tasks and services for Service Auto Scaling.  

 

Analyze

Amazon CloudWatch Logs Insights empowers you to unlock greater value from your log data. You can query logs sent to CloudWatch in the AWS console, or start writing queries with aggregations, filters, and regular expressions for complete operational visibility. In addition, you can visualize time-series data, drill down into individual log events, and export query results to CloudWatch Dashboards.

Powered by generative AI, you can use natural language to query your logs (in preview) and quickly surface actionable insights, by asking questions such as “Show me the slowest Lambda functions”. You can describe in plain language the log data you need and CloudWatch automatically generates a tailored query, making it easy to analyze logs and surface insights faster no matter your level of expertise.

Powered by AI/ML, you can also speed up log investigation using CloudWatch Logs Anomaly Detection, which uses machine learning algorithms that have learned from decades of Amazon.com and AWS operational data at immense scale. With this feature, CloudWatch can recognize shared structures among log records, extract notable content and trends, and identify anomalies, helping you speed up MTTR without needing to set up configuration parameters.

With CloudWatch Logs Live Tail, you can interactively analyze streaming log data in real-time from a central view. Launch contextual queries to seamlessly transition from real-time log monitoring to deeper log analytics and accelerated incident investigation and resolution. Live Tail removes the need for custom solutions and consolidates critical logging capabilities to help you optimize time to detection and resolution.

Amazon CloudWatch Metrics Insights is a fast, flexible, SQL-based query engine that enables you to identify trends and patterns within millions of operational metrics in near real time. Metrics Insights allows you to gain better visibility on your infrastructure and large-scale application performance with flexible querying and on-the-fly metric aggregations. Metrics Insights queries can be used to create powerful visualizations, helping you proactively monitor and pinpoint issues quickly, and reduce MTTR. 

Amazon CloudWatch allows you to monitor trends and seasonality with 15 months of metric data (storage and retention). This lets you perform historical analysis to fine-tune resource utilization. With CloudWatch, you can also collect up to one second of health metrics, including custom metrics (such as those coming from your on-premises applications). Granular real-time data enables better visualization and the ability to spot and monitor trends to optimize application performance and operational health.

Amazon CloudWatch Metric Math enables you to perform calculations across multiple metrics for real-time analysis so you can easily derive insights from your existing CloudWatch metrics and better understand the operational health and performance of your infrastructure. You can visualize these computed metrics in the AWS Management Console, add them to CloudWatch dashboards, or retrieve them using the GetMetricData API action. Metric Math supports arithmetic operations (such as +, -, /, and *) and mathematical functions (such as Sum, Average, Min, Max, and Standard Deviation).

Lambda Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs, and custom logs, using the CloudWatch Logs Insights advanced query language.

CloudWatch Container Insights and Lambda Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs (stdout/stderr), and custom logs using CloudWatch Logs Insights advanced query language. Container Insights additionally factors in predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs and Amazon EKS control plane logs.

Powered by generative AI, you can also use natural language to query the metrics and logs (in preview) observed on your containers and serverless applications running on AWS Lambda, by asking questions such as “Show me the slowest Lambda functions”. This helps you analyze telemetry and surface insights faster no matter your level of expertise.
 

 

Amazon CloudWatch now includes Contributor Insights, which analyzes time-series data to provide a view of the top contributors influencing system performance. Once set up, Contributor Insights runs continuously without additional user intervention. This helps developers and operators more quickly isolate, diagnose, and remediate issues during an operational event. Contributor Insights helps you understand who or what is impacting your system and application performance, such as a specific resource, customer account, or API call. This enables you to pinpoint outliers, find the heaviest traffic patterns, and rank the most-used system processes. You can create Contributor Insights rules to evaluate patterns in structured log events as they are sent to CloudWatch Logs, including logs from AWS services, such as AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC), Amazon API Gateway, and any custom logs sent by your service or on-premises servers, such as Apache access logs, and other clouds. Contributor Insights evaluates these log events in near real time and display reports that show the top contributors and number of unique contributors in a dataset. A contributor is an aggregate metric based on dimensions contained as log fields in CloudWatch Logs, such as account-id or interface-id in VPC Flow Logs, or any other custom set of dimensions. You can sort and filter contributor data based on your own custom criteria. Contributor Insights report data can be displayed on CloudWatch dashboards, graphed alongside CloudWatch metrics, and added to CloudWatch alarms.

Amazon CloudWatch Evidently lets application developers conduct experiments and identify unintended consequences of new features before rolling them out for general use, thereby reducing risk related to new feature roll-out. Evidently allows you to validate new features across the full application stack before release, which makes for a safer release. When launching new features, you can expose them to a small user base, monitor key metrics such as page load times or conversions, and then dial up traffic. Evidently also allows you to try different designs, collect user data, and release the most effective design in production. 

 

Compliance and Security

Amazon CloudWatch is integrated with AWS Identity and Access Management (IAM) so you can control which users and resources have permission to access your data and how they can access it.

Amazon CloudWatch Logs is also PCI and FedRamp compliant. Data is encrypted at rest and in transit. You can also use AWS Key Management Service (AWS KMS) encryption to encrypt your log groups for added compliance and security.

Amazon CloudWatch Logs data protection helps you to define data protection policies that can discover and protect sensitive data logged by systems and applications. This feature automatically identifies and masks sensitive information in your logs using ML and pattern matching based on the policy that you define. Data protection can help you streamline your architecture by offloading data protection logic from your applications, while helping support your compliance objectives. You can define your data protection policies to scan logs as they are ingested to determine how much sensitive data they contain and mask sensitive data that is detected. Masked data can also be unmasked for validation by security engineers through elevated privileges with IAM.