We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.
If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”
Customize cookie preferences
We use cookies and similar tools (collectively, "cookies") for the following purposes.
Essential
Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms.
Performance
Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes.
Allowed
Functional
Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly.
Allowed
Advertising
Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising.
Allowed
Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by selecting Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice.
Unable to save cookie preferences
We will only store essential cookies at this time, because we were unable to save your cookie preferences.
If you want to change your cookie preferences, try again later using the link in the AWS console footer, or contact support if the problem persists.
Monitoring and observability are critical components for ensuring the availability,
performance, reliability, and security of your cloud-based workloads and data.
Monitoring involves the systematic collection and analysis of data, such as
metrics, logs, and traces, to track the health and efficiency of cloud resources
as well as supporting reactive incident management.
Observability focuses on understanding the internal state of a system through
dynamic, real-time insights, allowing for proactive issue identification and
resolution.
AWS offers a range of tools and services for both monitoring and observability. They
can be used to collect data, analyze metrics, and create alarms to notify you of issues.
In addition, they can provide logs and metrics that you can use to identify and
troubleshoot the root cause of problems.
These services integrate with more than 120 other AWS services (including Amazon EC2, Amazon EKS,
Amazon ECS, Lambda, and Amazon S3) and partners, and integrates with a wide range of third-party
observability and cloud management tools that use near real-time feeds of AWS-native
telemetry.
This guide will help you select the AWS monitoring and observability services and
tools that are the best fit for your needs and your organization.
In this four-minute clip from his re:Invent 2023 presentation, senior AWS
worldwide specialist Toshal Dudhwala outlines how to build an observability
strategy.
Understand
To choose the right AWS monitoring and observability tools for your needs, it may help to
first understand the range of options available to you and how the main services fit
together.
Start with your three key data sources: logs, metrics, and traces. The data from those
sources can be consumed using Amazon CloudWatch, AWS X-Ray, or AWS Distro for OpenTelemetry
(ADOT) agents.
Here’s when you might use each of these data collection sources:
Use Amazon CloudWatch to collect custom metrics from your own applications to monitor operational
performance, troubleshoot issues, and spot trends. You can also use the CloudWatch agent for
collecting log, metrics and traces.. In addition, you can use open source tools such as
Fluent D or FluentBit to collect logs and send them to CloudWatch logs.
Use AWS X-Ray to perform distributed
tracing across multiple applications and systems to help find latency in a system
and target it for improvement. You can use the CloudWatch agent to collect traces and send
them to X-Ray.
Use AWS Distro for OpenTelemetry to collect metrics and traces.
Instrumentation
There are two major categories of instrumentation available within AWS monitoring and
observability services: AWS Native Services and Open Source Managed Services.
Open Source Managed Services include Amazon Managed Service for Prometheus (a managed
monitoring service based on and compatible with the popular Prometheus open source
monitoring and alerting solution), Amazon OpenSearch Service, and AWS Distro for
OpenTelemetry (which not only supports AWS X-Ray, but also Jaeger and Zipkin
Tracing).
Other services important to monitoring and observability include:
AWS Config provides a detailed view of your resource configurations in your AWS account.
This view includes the relationship between your resources and the past configurations of
your resources, so you can see how the relationships and configurations of your resources
change over time. If you are using AWS Config rules, AWS Config evaluates your resource configurations for desired
settings.
AWS CloudTrail helps you enable operational and risk auditing, governance, and
compliance by recording events of actions taken by users, roles or AWS services. Actions
taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events
include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS
SDKs and APIs.
In addition, you can select from a range of machine learning and analytics services to gain further benefit from your monitoring and observability
data.
Consider
Choosing the right monitoring and observability services on AWS depends on your specific
requirements and use cases. Here are some criteria to consider when making your decision.
Monitoring service capabilities
Consider whether the service provides a
comprehensive set of tools that encompass metrics, logs, and traces. Metrics
offer quantitative data on system performance, logs provide detailed event information,
and traces allow you to follow transactions across your infrastructure.
Also assess whether the service supports diverse data types and formats. Additionally,
look for advanced features such as anomaly detection, machine learning-driven insights,
and the ability to correlate data from different sources. A well-rounded solution should
enable holistic visibility into your AWS environment, aiding in efficient troubleshooting,
performance optimization, and proactive problem resolution.
The more versatile and integrated the service capabilities, the better equipped you
are to gain deep insights into your applications and infrastructure. Review the AWS Observability section of the Management and Governance Cloud Environment
Guide (part of the AWS Well-Architected Framework) for more details on service
capabilities.
Ease of integration
Assess how seamlessly the service integrates with your existing AWS infrastructure,
applications, and deployment processes.
Look for compatibility with popular programming languages, frameworks, and third-party
tools that your organization uses. Also evaluate the availability of SDKs, APIs, and
plugins that simplify the integration process. Better integration can facilitate the
collection and analysis of data without imposing significant overhead on your
applications.
Data retention and storage capabilities are pivotal considerations in selecting AWS
monitoring and observability services. For any service you are considering, examine
policies on storing and retaining historical data, as well as scalability to handle
increasing data volumes over time.
Assess whether the service supports long-term storage of metrics, logs, and traces,
enabling you to perform retrospective analysis and meet compliance requirements. Consider
also the ease with which you can access and retrieve archived data.
The service (or services) you use should achieve a balance between providing
sufficient retention periods for meaningful trend analysis and managing storage costs
effectively. A clear understanding of data retention and storage policies is important
when considering how your monitoring setup aligns with both operational needs and
regulatory obligations.
Scalability
Evaluate how well the service can scale alongside your evolving infrastructure and
growing workloads. A scalable solution should seamlessly handle increases in data volume,
user activity, and the complexity of your applications.
Consider the elasticity of the service, its ability to accommodate spikes in demand,
and whether it supports auto-scaling features to adapt to changing requirements
dynamically. Robust scalability helps ensure that your monitoring system remains
responsive and effective, providing timely insights even as your AWS environment
expands.
By choosing a service with strong scalability, you can confidently support the
continuous growth of your applications and infrastructure without compromising on
performance or incurring unnecessary operational challenges.
Alerting and notification
Assess the alerting capabilities of the service, including the ability to set up
alerts based on predefined thresholds, anomalies, or specific events. Look for flexibility
in configuring alert conditions and the ease of managing notification channels such as
email, SMS, or integrations with collaboration tools.
The service (or services) you choose should provide timely and actionable alerts,
enabling your team to respond promptly to potential issues. Consider features such as
escalation policies and the ability to acknowledge or suppress alerts.
Integration with popular incident management platforms can enhance the overall
incident response workflow. Prioritize a monitoring service that empowers your team to
proactively address issues, minimizing downtime and ensuring the continuous health of your
AWS environment.
Cost
Understand the pricing model of each service, considering factors such as data volume,
storage, and any additional features. Review cost information for any service you are
considering (such as this billing and cost summary for Amazon CloudWatch).
Evaluate whether the pricing structure aligns with your budget and usage patterns.
Some services may offer a pay-as-you-go model, while others may have tiered pricing or
subscription plans. Consider the potential impact of all costs – including data transfer
fees or charges for accessing historical data.
Additionally, assess whether the pricing scales efficiently with the growth of your
infrastructure. A clear understanding of costs ensures that your monitoring solution
remains cost-effective without compromising on essential features, allowing you to
optimize your budget while meeting your operational requirements on AWS.
Customization and extensibility
Assess whether the service allows you to tailor dashboards, reports, and alerts to
meet your needs. Look for the flexibility to create custom metrics, queries, and
visualizations. Integration with third-party tools and support for common APIs enhance the
service's extensibility. Evaluate whether the monitoring solution can adapt to the unique
needs of your applications and infrastructure.
A highly customizable and extensible service empowers your team to fine-tune
monitoring parameters, adapt to evolving use cases, and integrate seamlessly with your
existing workflows and tools. Prioritize solutions that provide a high degree of
configurability, allowing you to optimize monitoring for your specific AWS environment and
operational preferences.
Security and compliance
Evaluate how a service provides adherence to AWS security best practices, ensuring data confidentiality,
integrity, and availability. Check for features such as encryption in transit and at rest,
access controls, and secure authentication mechanisms. Assess whether the service supports
compliance with relevant regulations and standards applicable to your industry.
Look for audit trail capabilities and the ability to generate compliance reports. The
goal is to help safeguard sensitive data by using monitoring practices to align with
regulatory requirements.
Prioritize services that provide a robust security posture, enabling your organization
to maintain a secure and compliant AWS environment while gaining insights into your
applications and infrastructure.
Machine learning and analytics
Evaluate whether the service uses machine learning (ML) to provide advanced insights,
anomaly detection, and predictive analytics. Look for features that automatically identify
patterns, trends, and potential issues within your data.
A robust machine learning component can enhance the accuracy of anomaly detection,
reducing false positives and improving the overall effectiveness of your monitoring
system. Additionally, consider the depth of analytics provided, such as root cause
analysis and trend forecasting. A service with strong machine learning and analytics
capabilities empowers your team to proactively address issues, optimize performance, and
gain deeper insights into the behavior of your AWS applications and infrastructure.
Global reach
Global reach is a critical criterion for AWS monitoring and observability services,
particularly if your infrastructure is distributed across multiple Regions. Assess whether
the monitoring service provides visibility into the performance and health of your
resources across different AWS Regions.
Consider the ability to aggregate and analyze data from diverse geographical
locations, ensuring a comprehensive understanding of your global infrastructure. Look for
features that support centralized management and monitoring, allowing you to efficiently
oversee operations on a global scale.
A service with strong global reach helps ensure that you can maintain consistent
monitoring practices, troubleshoot issues, and optimize performance seamlessly across the
entire spectrum of your AWS deployment, irrespective of geographical boundaries. This
capability is particularly valuable for organizations with a geographically distributed or
multi-cloud infrastructure.
Consider whether the service provides a
comprehensive set of tools that encompass metrics, logs, and traces. Metrics
offer quantitative data on system performance, logs provide detailed event information,
and traces allow you to follow transactions across your infrastructure.
Also assess whether the service supports diverse data types and formats. Additionally,
look for advanced features such as anomaly detection, machine learning-driven insights,
and the ability to correlate data from different sources. A well-rounded solution should
enable holistic visibility into your AWS environment, aiding in efficient troubleshooting,
performance optimization, and proactive problem resolution.
The more versatile and integrated the service capabilities, the better equipped you
are to gain deep insights into your applications and infrastructure. Review the AWS Observability section of the Management and Governance Cloud Environment
Guide (part of the AWS Well-Architected Framework) for more details on service
capabilities.
Choose
Now that you know the criteria by which you will be evaluating your monitoring and
observability options, you are ready to choose which AWS monitoring and observability services
might be a good fit for your organizational requirements.
The following table highlights which services are optimized for which circumstances. Use the
table to help determine the service that is the best fit for your organization and use
case.
Use case
What is it optimized for?
Monitoring and observability services
Monitoring and alerting
These services are optimized to provide real-time visibility, proactive issue
detection, resource optimization, and efficient incident response, contributing to
overall application and infrastructure health.
These services provide comprehensive insights into application behavior, offer
tools for identifying and resolving performance bottlenecks, aid in efficient
troubleshooting, and contribute to delivering modern user experiences across
distributed and web applications.
These services provide a holistic view of your cloud resources, helping you make
more informed decisions about resource utilization, performance optimization, and
cost-efficiency.
These services help you efficiently manage and analyze log data, troubleshoot,
detect anomalies, support security, meeting compliance requirements, and get
actionable insights into your applications and infrastructure.
Optimized to provide a robust security framework, enabling proactive threat
detection, continuous monitoring, compliance tracking, and audit capabilities to help
safeguard your AWS resources and maintain a secure and compliant environment.
These services provide visibility into network traffic, enhance security by
detecting and preventing threats, enable efficient network traffic management, and
support incident response activities.
These services provide a comprehensive view of the interactions and dependencies
within your distributed applications. They enable you to diagnose performance
bottlenecks, optimize application performance, and support the smooth functioning of
complex systems by offering insights into how different parts of your application
communicate and interact.
Maintain reliable operations, provide modern digital experiences for your
customers, and get help to meet service level objectives and performance commitments.
You should now have a clear understanding of what each AWS monitoring and observability
service (and the supporting AWS tools and services) does, and which might be right for
you.
To explore how to use and learn more about each of the available AWS observability
services, we have provided a pathway to explore how each of the services work. The following
section provides links to in-depth documentation, hands-on tutorials, and resources to get you
started.
Amazon CloudWatch
Getting Started with Amazon CloudWatch
Monitor your AWS resources and the applications you run on AWS in real time using Amazon CloudWatch. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.
Set up Container Insights on Amazon EKS and Kubernetes
Set up the Amazon CloudWatch Observability ESK add-on and ADTO on your EKS cluster to send metrics to CloudWatch. You will also learn how to set up Fluent Bit or Fluentd to send logs to CloudWatch Logs.
Learn to configure cluster and service level metrics, deploy ADOT to collect EC2 instance level metrics, and set up FireLens to send logs to CloudWatch Logs.
Getting started with Amazon CloudWatch Application Signals
In this guide, you will learn how to automatically instrument your applications on AWS so that you can monitor current application health and track long-term application performance against your business objectives.
Amazon CloudWatch Application Signals for automatic instrumentation of your applications
This blog post provides an in-depth walk-through the AWS Management Console for Amazon CloudWatch Application Signals demonstrating how to collect telemetry for your EKS clusters.
How to monitor application health using SLOs with Amazon CloudWatch Application Signals
This blog post demonstrates how Amazon CloudWatch Application signals enables you to automatically instrument and operate applications on AWS to track application performance against your most important objectives.
Learn how to create a few “Hello World” Lambda functions and monitor them using Lambda Insights. You will be using the AWS CDK to deploy the architecture.
Use Logs Insights to utilize the data points, patterns, trends, and insights present in all the various logs created by AWS services to understand how your applications and AWS resources are behaving, identify room for improvement, and address operational issues.
Monitor your AWS resources and the applications you run on AWS in real time using Amazon CloudWatch. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.
Set up Container Insights on Amazon EKS and Kubernetes
Set up the Amazon CloudWatch Observability ESK add-on and ADTO on your EKS cluster to send metrics to CloudWatch. You will also learn how to set up Fluent Bit or Fluentd to send logs to CloudWatch Logs.
Learn to configure cluster and service level metrics, deploy ADOT to collect EC2 instance level metrics, and set up FireLens to send logs to CloudWatch Logs.
AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. Here's how to get started with it.
Learn how to integrate CloudTrail logs into CloudWatch and use features such as CloudWatch Log Insights, CloudWatch Metric Filters, CloudWatch Metric Alarms and CloudWatch Dashboards.
Build a security log analysis platform on Amazon OpenSearch Service and get started with building a cost-efficient solution for log ingestion, analysis and dashboarding.
This guide will walk you through launching a sample application. Then you will learn how to instrument your application and explore other services that are integrated with X-Ray.
Application logging and monitoring using AWS X-Ray
Learn how AWS X-Ray collects data about requests that your application serves, and it helps you view, filter, and gain insights into that data to identify issues and opportunities for optimization.
AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. Here's how to get started with it.
Learn how to integrate CloudTrail logs into CloudWatch and use features such as CloudWatch Log Insights, CloudWatch Metric Filters, CloudWatch Metric Alarms and CloudWatch Dashboards.