Amazon CloudWatch is a monitoring and management service that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. With CloudWatch, you can collect and access all your performance and operational data in form of logs and metrics from a single platform. This allows you to overcome the challenge of monitoring individual systems and applications in silos (server, network, database, etc.). CloudWatch enables you to monitor your complete stack (applications, infrastructure, and services) and leverage alarms, logs, and events data to take automated actions and reduce Mean Time to Resolution (MTTR). This frees up important resources and allows you to focus on building applications and business value.
CloudWatch gives you actionable insights that help you optimize application performance, manage resource utilization, and understand system-wide operational health. CloudWatch provides up to 1-second visibility of metrics and logs data, 15 months of data retention (metrics), and the ability to perform calculations on metrics. This allows you to perform historical analysis for cost optimization and derive real-time insights into optimizing applications and infrastructure resources.
You can use CloudWatch Container Insights to monitor, troubleshoot, and alarm on your containerized applications and microservices. CloudWatch collects, aggregates, and summarizes compute utilization information like CPU, memory, disk, and network data, as well as diagnostic information like container restart failures, to help DevOps engineers isolate issues and resolve them quickly. Container Insights gives you insights from container management services such as Amazon ECS for Kubernetes (EKS), Amazon’s Elastic Container Service (ECS), AWS Fargate, and standalone Kubernetes (k8s).
Easily collect and store logs
The Amazon CloudWatch Logs service allows you to collect and store logs from your resources, applications, and services in near real-time. There are three main categories of logs 1) Vended logs. These are natively published by AWS services on behalf of the customer. Currently Amazon VPC Flow Logs and Amazon Route 53 logs are the two supported types. 2) Logs that are published by AWS services. Currently over 30 AWS services publish logs to CloudWatch. These services include Amazon API Gateway, AWS Lambda, AWS CloudTrail, and many others. 3) Custom logs. These are logs from your own application and on-premises resources. You can use AWS Systems Manager to install a CloudWatch Agent, or you can use the PutLogData API action to easily publish logs.
Collecting metrics from distributed applications (such as those built using microservices architectures) is time consuming. Amazon CloudWatch allows you to collect default metrics from more than 70 AWS services, such as Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, AWS Lambda, and Amazon API Gateway, without any action on your part. For example, EC2 instances automatically publish CPU utilization, data transfer, and disk usage metrics to help you understand changes in state. You can use one of seven built-in metrics for API Gateway to detect latency or leverage one of eight built-in metrics for AWS Lambda to detect errors and throttles. If you need more detailed metrics beyond the default metrics, such as shard-level Amazon Kinesis Data Streams metrics, then you can simply opt-in per resource.
Amazon CloudWatch allows you to collect custom metrics from your own applications to monitor operational performance, troubleshoot issues, and spot trends. User activity is an example of a custom metric you can collect and monitor over a period of time. You can use CloudWatch Agent or the PutMetricData API action to publish these metrics to CloudWatch. All the same CloudWatch functionality will be available at up to one-second frequency for your own custom metrics data, including statistics, graphs, and alarms.
Collect and aggregate container metrics and logs
Container Insights simplifies the collection and aggregation of curated metrics and container ecosystem logs. It collects compute performance metrics such as CPU, memory, network, and disk information from each container as performance events and automatically generates custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch Logs with metadata about the running environment such as the Amazon EC2 instance ID, Service, Amazon EBS volume mount and ID, etc., to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. Container Insights also provides an option to collect application logs (stdout/stderr), custom logs, predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs and Amazon EKS control plane logs. For Amazon EKS and k8s clusters, a preconfigured FluentD agent can be used to collect your logs. See the Container Insights logs setup documentation for more details. For Amazon ECS, the Amazon CloudWatch Logs logging driver or Fluent Bit can be used to collect application logs.
Collect and aggregate Lambda metrics and logs
CloudWatch Lambda Insights simplifies the collection and aggregation of curated metrics and logs from AWS Lambda functions. It collects compute performance metrics such as CPU, memory, and network from each Lambda function as performance events, while automatically generating custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch logs to simplify monitoring and troubleshooting. CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. See the Lambda Insights getting started documentation for more details.
Unified operational view with dashboards
Amazon CloudWatch dashboards enable you to create re-usable graphs and visualize your cloud resources and applications in a unified view. You can graph metrics and logs data side by side in a single dashboard to quickly get the context and go from diagnosing the problem to understanding the root cause. For example, you can visualize key metrics, like CPU utilization and memory, and compare them to capacity. You can also correlate the log pattern of a specific metric and set alarms to be proactively alerted about performance and operational issues. This gives you system-wide visibility into operational health and the ability to quickly troubleshoot issues, reducing Mean Time to Resolution (MTTR).
Amazon CloudWatch composite alarms allow you to combine multiple alarms and reduce alarm noise. If an application issue affects several resources in an application, you will receive a single alarm notification for the entire application instead of one for each affected service component or resource. This helps you stay focused on finding the root cause of operational issues to reduce application downtime. You can provide an overall state for a grouping of resources like an application, AWS Region, or Availability Zone.
High resolution alarms
Amazon CloudWatch alarms allow you to set a threshold on metrics and trigger an action. You can create high-resolution alarms, set a percentile as the statistic, and either specify an action or ignore as appropriate. For example, you can create alarms on Amazon EC2 metrics, set notifications, and take one or more actions to detect and shut down unused or underutilized instances. Real-time alarming on metrics and events enables you to minimize downtime and potential business impact.
Logs and metrics correlation
Applications and infrastructure resources generate lots of operational and monitoring data in form of logs and metrics. In addition to providing ability to access and visualize these data sets in a single platform, Amazon CloudWatch also makes it easy to correlate metrics and logs. This helps you quickly go from diagnosing the problem to understanding the root cause. For example, you can correlate a log pattern, such as an error to a specific metric, and set alarms to be actively alerted of performance and operational issues.
Application Insights for .NET and SQL Server applications
Amazon CloudWatch Application Insights for .NET and SQL Server enables you to easily monitor .NET and SQL Server applications, so you can get visibility into the health of such applications. It helps identify and set up key metrics and logs across your application resources and technology stack i.e. database, web (IIS) and application servers, Operating System, load balancers, queues, etc. It constantly monitors these telemetry data to detect and correlate anomalies and errors, to notify you of any problems in your application. To aid in troubleshooting, it creates automated dashboards for the detected problems with correlated metric anomalies and log errors, along with additional insights to point you to their potential root-cause. This enables you to take quick remedial actions to ensure that your applications are healthy and end-users are not impacted.
Container monitoring insights
Container Insights provides automatic dashboards in the CloudWatch console. These dashboards summarize the compute performance, errors, and alarms by cluster, pod/task, and service. For Amazon EKS and k8s, dashboards are also available for nodes/EC2 instances and namespaces. Each dashboard summarizes the list of running pods/tasks or containers by CPU and memory for the selected time window, and allows you to contextually - based on time window and selected pod/task or container - dive deeper into application logs, AWS X-Ray traces, and performance events.
Lambda monitoring insights
Lambda Insights provides automatic dashboards in the CloudWatch console. These dashboards summarize the compute performance and errors. Each dashboard includes the list of metrics for the selected time window and allows you to contextually dive deeper — based on time window and selected function — into application logs, AWS X-Ray traces, and performance events.
Amazon CloudWatch Anomaly Detection applies machine-learning algorithms to continuously analyze data of a metric and identify anomalous behavior. It allows you to create alarms that auto-adjust thresholds based on natural metric patterns, such as time of day, day of week seasonality, or changing trends. You can also visualize metrics with anomaly detection bands on dashboards. This enables you to monitor, isolate, and troubleshoot unexpected changes in your metrics.
You can use Amazon CloudWatch ServiceLens to visualize and analyze the health, performance, and availability of your applications in a single place. CloudWatch ServiceLens ties together CloudWatch metrics and logs as well as traces from AWS X-Ray to give you a complete view of your applications and their dependencies. This enables you to quickly pinpoint performance bottlenecks, isolate root causes of application issues, and determine users impacted. CloudWatch ServiceLens enables you to gain visibility into your applications in three main areas: Infrastructure monitoring (using metrics and logs to understand the resources supporting your applications), transaction monitoring (using traces to understand dependencies between your resources), and end user monitoring (using canaries to monitor your endpoints and notify you when your end user experience has degraded). CloudWatch ServiceLens provides a Service Map that visualizes the contextual linking of all your resources, along with an intuitive interface so you can dive deep into correlated monitoring data.
Amazon CloudWatch Synthetics allows you to monitor application endpoints more easily. It runs tests on your endpoints every minute, 24x7, and alerts you as soon as your application endpoints don’t behave as expected. These tests can be customized to check for availability, latency, transactions, broken or dead links, step by step task completions, page load errors, load latencies for UI assets, complex wizard flows, or checkout flows in your applications. You can also use CloudWatch Synthetics to isolate alarming application endpoints and map them back to underlying infrastructure issues to reduce mean time to resolution. With this new feature, CloudWatch now collects canary traffic, which can continually verify your customer experience even when you don’t have any customer traffic on your applications, enabling you to discover issues before your customers do. CloudWatch Synthetics supports monitoring of your REST APIs, URLs, and website content, checking for unauthorized changes from phishing, code injection and cross-site scripting.
Auto Scaling helps you automate capacity and resource planning. You can set a threshold to alarm on a key metric and trigger an automated Auto Scaling action. For example, you could set up an Auto Scaling workflow to add or remove EC2 instances based on CPU utilization metrics and optimize resource costs.
Automate response to operational changes with CloudWatch Events
CloudWatch Events provides a near real-time stream of system events that describe changes to your AWS resources. It allows you to respond quickly to operational changes and take corrective action. You simply write rules to indicate which events are of interest to your application and what automated actions to take when a rule matches an event. You can, for example, set a rule to invoke AWS Lambda functions or notify an Amazon Simple Notification Service (SNS) topic.
Alarm and automate actions on EKS, ECS, and k8s clusters
For Amazon EKS and k8s clusters, Container Insights allows you to alarm on compute metrics to trigger auto scaling policies on your Amazon EC2 Auto Scaling group and provides you the ability to stop, terminate, reboot, and recover any Amazon EC2 instance. For Amazon ECS clusters, compute metrics from your tasks and services can be used for Service Auto Scaling.
Granular data and extended retention
Amazon CloudWatch allows you to monitor trends and seasonality with 15 months of metric data (storage and retention). This data allows you to perform historical analysis to fine-tune resource utilization. With CloudWatch, you can also collect up to 1 second of health metrics including custom ones, such as those coming from your on-premises applications. Granular real-time data enables better visualization and ability to spot and monitor trends to optimize application performance and operational health.
Custom operations on metrics
Amazon CloudWatch Metric Math enables you to perform calculations across multiple metrics for real-time analysis so you can easily derive insights from your existing CloudWatch metrics and better understand the operational health and performance of your infrastructure. You can visualize these computed metrics in the AWS Management Console, add them to CloudWatch dashboards, or retrieve them using the GetMetricData API action. Metric Math supports arithmetic operations such as +, -, /, *, and mathematical functions such as Sum, Average, Min, Max, and Standard Deviation.
Amazon CloudWatch Logs Insights enables you to drive actionable intelligence from your logs to address operational issues without needing to provision servers or manage software. You can instantly begin writing queries with aggregations, filters, and regular expressions. In addition, you can visualize timeseries data, drill down into individual log events, and export query results to CloudWatch Dashboards. This gives you complete operational visibility. With a few clicks in the AWS Management Console, you can start using Logs Insights to query logs sent to CloudWatch. You only pay for the queries you run.
Analyze container metrics, logs, and traces
Container Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs (stdout/stderr), custom logs, predefined Amazon EC2 instance logs, Amazon EKS/k8s data plane logs and Amazon EKS control plane logs using CloudWatch Logs Insights’ advance query language.
Analyze Lambda metrics, logs, and traces
Lambda Insights simplifies the analysis of observable data from metrics, logs, and traces by simplifying deep linking from automatic dashboards to granular performance events, application logs, and custom logs, using CloudWatch Logs Insights’ advanced query language.
Amazon CloudWatch now includes Contributor Insights, which analyzes time-series data to provide a view of the top contributors influencing system performance. Once set up, Contributor Insights runs continuously without needing additional user intervention. This helps developers and operators more quickly isolate, diagnose, and remediate issues during an operational event. Contributor Insights helps you understand who or what is impacting your system and application performance, such as a specific resource, customer account, or API call. This enables you to pinpoint outliers, find the heaviest traffic patterns, and rank the most utilized system processes. You can create Contributor Insights rules to evaluate patterns in structured log events as they are sent to CloudWatch Logs, including logs from AWS services like AWS CloudTrail, Amazon Virtual Private Cloud, Amazon API Gateway, and any custom logs sent by your service or on-premises servers, such as Apache access logs. Contributor Insights will evaluate these log events in real-time and display reports that show the top contributors and number of unique contributors in a dataset. A contributor is an aggregate metric based on dimensions contained as log fields in CloudWatch Logs, such as account-id or interface-id in VPC Flow Logs, or any other custom set of dimensions. You can sort and filter contributor data based on your own custom criteria. Contributor Insights report data can be displayed on CloudWatch dashboards, graphed alongside CloudWatch metrics, and added to CloudWatch alarms.
Compliance and Security
Amazon CloudWatch is integrated with AWS Identity and Access Management (IAM) so that you can control which users and resources have permission to access your data and how they can access it.
Amazon CloudWatch Logs is also PCI and FedRamp compliant. Data is encrypted at rest and during transfer. You can also use AWS KMS encryption to encrypt your log groups for added compliance and security.