Taking the first step
Purpose |
Help determine which AWS monitoring and observability services are the best fit for your organization. |
Last updated |
January 12, 2024 |
Covered services |
Introduction
Monitoring and observability are critical components for ensuring the availability, performance, reliability, and security of your cloud-based workloads and data.
-
Monitoring involves the systematic collection and analysis of data, such as metrics, logs, and traces, to track the health and efficiency of cloud resources as well as supporting reactive incident management.
-
Observability focuses on understanding the internal state of a system through dynamic, real-time insights, allowing for proactive issue identification and resolution.
AWS offers a range of tools and services for both monitoring and observability. They can be used to collect data, analyze metrics, and create alarms to notify you of issues. In addition, they can provide logs and metrics that you can use to identify and troubleshoot the root cause of problems.
These services integrate with more than 120 other AWS services (including Amazon EC2, Amazon EKS, Amazon ECS, Lambda, and Amazon S3) and partners, and integrates with a wide range of third-party observability and cloud management tools that use near real-time feeds of AWS-native telemetry.
This guide will help you select the AWS monitoring and observability services and tools that are the best fit for your needs and your organization.
In this four-minute clip from his re:Invent 2023 presentation, senior AWS worldwide specialist Toshal Dudhwala outlines how to build an observability strategy.
Understand
To choose the right AWS monitoring and observability tools for your needs, it may help to first understand the range of options available to you and how the main services fit together.

Start with your three key data sources: logs, metrics, and traces. The data from those sources can be consumed using Amazon CloudWatch, AWS X-Ray, or AWS Distro for OpenTelemetry (ADOT) agents.
Here’s when you might use each of these data collection sources:
-
Use Amazon CloudWatch to collect custom metrics from your own applications to monitor operational performance, troubleshoot issues, and spot trends. You can also use the CloudWatch agent for collecting log, metrics and traces.. In addition, you can use open source tools such as Fluent D or FluentBit to collect logs and send them to CloudWatch logs.
-
Use AWS X-Ray to perform distributed tracing across multiple applications and systems to help find latency in a system and target it for improvement. You can use the CloudWatch agent to collect traces and send them to X-Ray.
-
Use AWS Distro for OpenTelemetry to collect metrics and traces.
Instrumentation
There are two major categories of instrumentation available within AWS monitoring and observability services: AWS Native Services and Open Source Managed Services.
-
AWS Native Services include Amazon CloudWatch and AWS X-Ray. CloudWatch offers these key features of Container Insights, Lambda Insights, Contributor Insights, and Application Insights, that contribute to how you contextualize your data for insights and analysis.
-
Open Source Managed Services include Amazon Managed Service for Prometheus (a managed monitoring service based on and compatible with the popular Prometheus open source monitoring and alerting solution), Amazon OpenSearch Service, and AWS Distro for OpenTelemetry (which not only supports AWS X-Ray, but also Jaeger and Zipkin Tracing).
Visualization and analysis
The data you collect and ingest with these AWS services can be visualized and analysed using the Amazon CloudWatch Service Map, the AWS X-Ray trace map, Amazon Managed Grafana and Amazon CloudWatch Logs Insights.
Other services
Other services important to monitoring and observability include:
-
AWS Config provides a detailed view of your resource configurations in your AWS account. This view includes the relationship between your resources and the past configurations of your resources, so you can see how the relationships and configurations of your resources change over time. If you are using AWS Config rules, AWS Config evaluates your resource configurations for desired settings.
-
AWS CloudTrail helps you enable operational and risk auditing, governance, and compliance by recording events of actions taken by users, roles or AWS services. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs.
In addition, you can select from a range of machine learning
Consider
Choosing the right monitoring and observability services on AWS depends on your specific requirements and use cases. Here are some criteria to consider when making your decision.
Consider whether the service provides a
comprehensive set of tools that encompass metrics, logs, and traces
Also assess whether the service supports diverse data types and formats. Additionally, look for advanced features such as anomaly detection, machine learning-driven insights, and the ability to correlate data from different sources. A well-rounded solution should enable holistic visibility into your AWS environment, aiding in efficient troubleshooting, performance optimization, and proactive problem resolution.
The more versatile and integrated the service capabilities, the better equipped you are to gain deep insights into your applications and infrastructure. Review the AWS Observability section of the Management and Governance Cloud Environment Guide (part of the AWS Well-Architected Framework) for more details on service capabilities.
Choose
Now that you know the criteria by which you will be evaluating your monitoring and observability options, you are ready to choose which AWS monitoring and observability services might be a good fit for your organizational requirements.
The following table highlights which services are optimized for which circumstances. Use the table to help determine the service that is the best fit for your organization and use case.
Use case | What is it optimized for? | Monitoring and observability services |
---|---|---|
Monitoring and alerting |
These services are optimized to provide real-time visibility, proactive issue detection, resource optimization, and efficient incident response, contributing to overall application and infrastructure health. |
|
Application performance monitoring |
These services provide comprehensive insights into application behavior, offer tools for identifying and resolving performance bottlenecks, aid in efficient troubleshooting, and contribute to delivering modern user experiences across distributed and web applications. |
Amazon CloudWatch Application Signals |
Infrastructure observability |
These services provide a holistic view of your cloud resources, helping you make more informed decisions about resource utilization, performance optimization, and cost-efficiency. |
|
Logging and analysis |
These services help you efficiently manage and analyze log data, troubleshoot, detect anomalies, support security, meeting compliance requirements, and get actionable insights into your applications and infrastructure. |
Amazon Cloudwatch Logs Insights |
Security and compliance monitoring |
Optimized to provide a robust security framework, enabling proactive threat detection, continuous monitoring, compliance tracking, and audit capabilities to help safeguard your AWS resources and maintain a secure and compliant environment. |
|
Network monitoring |
These services provide visibility into network traffic, enhance security by detecting and preventing threats, enable efficient network traffic management, and support incident response activities. |
Amazon CloudWatch Network Monitor |
Distributed tracing |
These services provide a comprehensive view of the interactions and dependencies within your distributed applications. They enable you to diagnose performance bottlenecks, optimize application performance, and support the smooth functioning of complex systems by offering insights into how different parts of your application communicate and interact. |
|
Hybrid and multicloud observability |
Maintain reliable operations, provide modern digital experiences for your customers, and get help to meet service level objectives and performance commitments. |
Use
You should now have a clear understanding of what each AWS monitoring and observability service (and the supporting AWS tools and services) does, and which might be right for you.
To explore how to use and learn more about each of the available AWS observability services, we have provided a pathway to explore how each of the services work. The following section provides links to in-depth documentation, hands-on tutorials, and resources to get you started.
-
Getting Started with Amazon CloudWatch
Monitor your AWS resources and the applications you run on AWS in real time using Amazon CloudWatch. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.
-
Getting started with Amazon CloudWatch Metrics
This guide discusses basic monitoring and detailed monitoring, how to graph metrics, and how to use CloudWatch anomaly detection.
-
Set up Container Insights on Amazon EKS and Kubernetes
Set up the Amazon CloudWatch Observability ESK add-on and ADTO on your EKS cluster to send metrics to CloudWatch. You will also learn how to set up Fluent Bit or Fluentd to send logs to CloudWatch Logs.
-
Getting started with Amazon CloudWatch Application Insights
Learn how to use the console to enable CloudWatch Application Insights to manage your applications for monitoring.
-
Using Container Insights
Learn how CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices.
-
Setting up Container Insights on Amazon ECS
Learn to configure cluster and service level metrics, deploy ADOT to collect EC2 instance level metrics, and set up FireLens to send logs to CloudWatch Logs.
-
Getting started with AWS CloudTrail
AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. Here's how to get started with it.
-
Review AWS account activity
Learn how to review the AWS API activity in your AWS account for services that support CloudTrail.
-
Create a trail
Learn how to create a trail to log AWS API activity in all Regions including data and Insights events.
-
AWS CloudTrail Log Monitoring workshop
Learn how to integrate CloudTrail logs into CloudWatch and use features such as CloudWatch Log Insights, CloudWatch Metric Filters, CloudWatch Metric Alarms and CloudWatch Dashboards.
-
AWS CloudTrail best practices
Best practices for using CloudTrail to enable auditing across your organization.
Explore
-
Solutions
Explore solutions to help you implement monitoring and observability on AWS.
-
Whitepapers
Explore whitepapers to help you get started, learn best practices, and understand your monitoring and observability options.
-
Video, patterns, and guidance
Explore additional architectural guidance covering common use cases for monitoring and observability services.