Amazon CloudWatch is a monitoring and management service that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. With CloudWatch, you can collect and access all your performance and operational data in form of logs and metrics from a single platform. This allows you to overcome the challenge of monitoring individual systems and applications in silos (server, network, database, etc.). CloudWatch enables you to monitor your complete stack (applications, infrastructure, and services) and leverage alarms, logs, and events data to take automated actions and reduce Mean Time to Resolution (MTTR). This frees up important resources and allows you to focus on building applications and business value.
CloudWatch gives you actionable insights that help you optimize application performance, manage resource utilization, and understand system-wide operational health. CloudWatch provides up to 1-second visibility of metrics and logs data, 15 months of data retention (metrics), and the ability to perform calculations on metrics. This allows you to perform historical analysis for cost optimization and derive real-time insights into optimizing applications and infrastructure resources.
Easily collect and store logs
The Amazon CloudWatch Logs service allows you to collect and store logs from your resources, applications, and services in near real-time. There are three main categories of logs 1) Vended logs. These are natively published by AWS services on behalf of the customer. Currently Amazon VPC Flow Logs and Amazon Route 53 logs are the two supported types. 2) Logs that are published by AWS services. Currently over 30 AWS services publish logs to CloudWatch. These services include Amazon API Gateway, AWS Lambda, AWS CloudTrail, and many others. 3) Custom logs. These are logs from your own application and on-premises resources. You can use AWS Systems Manager to install a CloudWatch Agent, or you can use the PutLogData API action to easily publish logs.
Collecting metrics from distributed applications (such as those built using microservices architectures) is time consuming. Amazon CloudWatch allows you to collect default metrics from more than 70 AWS services, such as Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, AWS Lambda, and Amazon API Gateway, without any action on your part. For example, EC2 instances automatically publish CPU utilization, data transfer, and disk usage metrics to help you understand changes in state. You can use one of seven built-in metrics for API Gateway to detect latency or leverage one of eight built-in metrics for AWS Lambda to detect errors and throttles. If you need more detailed metrics beyond the default metrics, such as shard-level Amazon Kinesis Data Streams metrics, then you can simply opt-in per resource.
Amazon CloudWatch allows you to collect custom metrics from your own applications to monitor operational performance, troubleshoot issues, and spot trends. User activity is an example of a custom metric you can collect and monitor over a period of time. You can use CloudWatch Agent or the PutMetricData API action to publish these metrics to CloudWatch. All the same CloudWatch functionality will be available at up to one-second frequency for your own custom metrics data, including statistics, graphs, and alarms.
Unified operational view with dashboards
Amazon CloudWatch dashboards enable you to create re-usable graphs and visualize your cloud resources and applications in a unified view. You can graph metrics and logs data side by side in a single dashboard to quickly get the context and go from diagnosing the problem to understanding the root cause. For example, you can visualize key metrics, like CPU utilization and memory, and compare them to capacity. You can also correlate the log pattern of a specific metric and set alarms to be proactively alerted about performance and operational issues. This gives you system-wide visibility into operational health and the ability to quickly troubleshoot issues, reducing Mean Time to Resolution (MTTR).
High resolution alarms
Amazon CloudWatch alarms allow you to set a threshold on metrics and trigger an action. You can create high-resolution alarms, set a percentile as the statistic, and either specify an action or ignore as appropriate. For example, you can create alarms on Amazon EC2 metrics, set notifications, and take one or more actions to detect and shut down unused or underutilized instances. Real-time alarming on metrics and events enables you to minimize downtime and potential business impact.
Logs and metrics correlation
Applications and infrastructure resources generate lots of operational and monitoring data in form of logs and metrics. In addition to providing ability to access and visualize these data sets in a single platform, Amazon CloudWatch also makes it easy to correlate metrics and logs. This helps you quickly go from diagnosing the problem to understanding the root cause. For example, you can correlate a log pattern, such as an error to a specific metric, and set alarms to be actively alerted of performance and operational issues.
Auto Scaling helps you automate capacity and resource planning. You can set a threshold to alarm on a key metric and trigger an automated Auto Scaling action. For example, you could set up an Auto Scaling workflow to add or remove EC2 instances based on CPU utilization metrics and optimize resource costs.
Automate response to operational changes with CloudWatch Events
CloudWatch Events provides a near real-time stream of system events that describe changes to your AWS resources. It allows you to respond quickly to operational changes and take corrective action. You simply write rules to indicate which events are of interest to your application and what automated actions to take when a rule matches an event. You can, for example, set a rule to invoke AWS Lambda functions or notify an Amazon Simple Notification Service (SNS) topic.
Granular data and extended retention
Amazon CloudWatch allows you to monitor trends and seasonality with 15 months of metric data (storage and retention). This data allows you to perform historical analysis to fine-tune resource utilization. With CloudWatch, you can also collect up to 1 second of health metrics including custom ones, such as those coming from your on-premises applications. Granular real-time data enables better visualization and ability to spot and monitor trends to optimize application performance and operational health.
Custom operations on metrics
Amazon CloudWatch Metric Math enables you to perform calculations across multiple metrics for real-time analysis so you can easily derive insights from your existing CloudWatch metrics and better understand the operational health and performance of your infrastructure. You can visualize these computed metrics in the AWS Management Console, add them to CloudWatch dashboards, or retrieve them using the GetMetricData API action. Metric Math supports arithmetic operations such as +, -, /, *, and mathematical functions such as Sum, Average, Min, Max, and Standard Deviation.
Amazon CloudWatch Logs Insights enables you to drive actionable intelligence from your logs to address operational issues without needing to provision servers or manage software. You can instantly begin writing queries with aggregations, filters, and regular expressions. In addition, you can visualize timeseries data, drill down into individual log events, and export query results to CloudWatch Dashboards. This gives you complete operational visibility. With a few clicks in the AWS Management Console, you can start using Logs Insights to query logs sent to CloudWatch. You only pay for the queries you run.
Compliance and Security
Amazon CloudWatch is integrated with AWS Identity and Access Management (IAM) so that you can control which users and resources have permission to access your data and how they can access it.
Amazon CloudWatch Logs is also PCI and FedRamp compliant. Data is encrypted at rest and during transfer. You can also use AWS KMS encryption to encrypt your log groups for added compliance and security.