AWS Cloud Operations

Monitoring and Observability

Gain insights and improve the performance of your applications and infrastructure

Why Monitoring and Observability?

Full-stack observability at AWS includes AWS-native, Application Performance Monitoring (APM), and open-source solutions, giving you the ability to understand what is happening across your technology stack at any time. AWS observability lets you collect, correlate, aggregate, and analyze telemetry in your network, infrastructure, and applications in the cloud, hybrid, or on-premises environments so you can gain insights into the behavior, performance, and health of your system. These insights help you detect, investigate, and remediate problems faster; and coupled with artificial intelligence and machine learning, proactively react, predict, and prevent problems.

Benefits

Know what is going on anywhere and everywhere in your system to provide the best possible experience for your end users. Detect problems quickly, investigate efficiently, and remediate as soon as possible to minimize disruption for your customers and reduce Mean Time to Resolution (MTTR).

When application issues occur, engage the correct stakeholder for any alerts from the beginning. IT and business teams are able to automate mundane and repetitive tasks while streamlining complex ones. Working together, IT and business teams can use insights from observability data to take a more user-centric approach and deliver exceptional end user experiences.

Across hundreds of thousands of instances, a small percentage performance improvement in how much CPU an application uses can add up to millions of dollars in savings. Similarly, by using observability to understand and predict your future capacity needs, you can take advantage of the cost savings available from reserve and spot pricing.

Elevate your customer experiences and business outcomes when you improve application, infrastructure, and network availability. Reduce downtimes and build fast, seamless digital experiences for your end customers. This allows both your internal teams and the end customers to operate efficiently to develop and deploy faster.

Lennar

To support our growing multicloud footprint, we needed a scalable, cost-effective observability solution that could unify monitoring across Linux, Windows, and Kubernetes environments without increasing operational burden. By implementing Amazon Managed Service for Prometheus and Amazon Managed Grafana, we established a unified observability platform that delivers real-time, end-to-end visibility across our compute landscape. The fully managed nature of these services eliminated infrastructure management overhead while improving reliability, allowing our engineering teams to focus on delivering customer value rather than maintaining tooling.

Carlos Sanchez - Sr. Manager of Cloud Platform

Rego Consulting

Over the past year, CloudWatch Synthetics and a simple system based on Amazon CloudWatch Alarms, Amazon SES, and AWS Lambda functions have proactively allowed us to respond to our customer's application and infrastructure issues. With CloudWatch Synthetics, our DevOps and support teams have been able begin analyzing and resolving problems even before the client notifies us of the issue. CloudWatch Synthetics is a critical component of exceeding SLAs/SLOs for our customers and, ultimately, our success.

Steve Seaney, SVP, SaaS DevOps and Architecture, Rego Consulting

Booking.com

We were looking for an easy and seamless integration that we could get up and running quickly to collect core web vital metrics for our products. We have been using Amazon CloudWatch RUM to monitor our website performance, specifically page load times, JavaScript errors, and other core web vital metrics. Using RUM has helped our team collect and measure real-world performance metrics of our websites, while also giving us a unified way to collect and analyze that data. What made RUM stand out was how it integrated seamlessly with our products and other parts of CloudWatch, allowing us to use collected data for further processing, without the added worry of loss of connectivity or data shortage.

Matt Crouch, Web Architect, Booking.com

Mapbox

We were looking to consolidate all our monitoring, logging, metrics, and alerting under one tool. CloudWatch has helped us alleviate the operational burden to set up, configure, and learn third-party systems. Our teams use CloudWatch extensively to monitor error rates and status codes for multiple high-profile workloads. We also use CloudWatch to automate Auto Scaling actions, allowing us to optimize the cost of Amazon EC2 instance types powering our Amazon ECS clusters. CloudWatch Events enable us to provide utilization and pricing information to teams so they can audit account security, trigger AWS Lambda actions for compliance and security use cases, and schedule our resources using the cloud. CloudWatch enables next-level automation and expands the capacity of each individual.

Emily McAfee, Platform Engineering Manager, Mapbox

HP Print Business

HP Print Org supports over 500 services running on Amazon Elastic Kubernetes Service (EKS). The team used self-hosted Prometheus to monitor the hardware and services metrics. As the platform grew, they struggled to keep up with the monitoring, especially maintaining the self-hosted, multi-region Prometheus setup

Venkat Prasad Durga - Software Design Specialist at HP Print Business

Resources

Workshop

One Observability Workshop

One Observability Workshop: Get hands-on experience learning a wide variety of toolsets AWS offers to setup monitoring and observability of your applications.

View lab

Blog

Viewing

Viewing Amazon CloudWatch metrics with Amazon Managed Service for Prometheus and Amazon Managed Grafana.

View blog

Video

Learn how

Learn how AWS Cloud Operations is built for monitoring and operating at cloud scale.

See video playlist

Monitoring and Observability

Why Monitoring and Observability?

Benefits

Resources

One Observability Workshop

Viewing

Learn how

Related services

Amazon CloudWatch

AWS X-Ray

Amazon Managed Grafana

Amazon Managed Service for Prometheus

Learn

Resources

Developers

Help

Monitoring and Observability

Why Monitoring and Observability?

Benefits

Understand application health

Accelerate collaboration

Reduce operational cost

Increase customer satisfaction

Resources

One Observability Workshop

Viewing

Learn how

Related services

Amazon CloudWatch

AWS X-Ray

Amazon Managed Grafana

Amazon Managed Service for Prometheus

Learn

Resources

Developers

Help