Monitoring and Observability
Gain insights and improve the performance of your applications and infrastructure
Full-stack observability at AWS includes AWS-native, Application Performance Monitoring (APM), and open-source solutions, giving you the ability to understand what is happening across your technology stack at any time. AWS observability lets you collect, correlate, aggregate, and analyze telemetry in your network, infrastructure, and applications in the cloud, hybrid, or on-premises environments so you can gain insights into the behavior, performance, and health of your system. These insights help you detect, investigate, and remediate problems faster; and coupled with artificial intelligence and machine learning, proactively react, predict, and prevent problems.
Understand application health
Know what is going on anywhere and everywhere in your system to provide the best possible experience for your end users. Detect problems quickly, investigate efficiently, and remediate as soon as possible to minimize disruption for your customers and reduce Mean Time to Resolution (MTTR).
When application issues occur, engage the correct stakeholder for any alerts from the beginning. IT and business teams are able to automate mundane and repetitive tasks while streamlining complex ones. Working together, IT and business teams can use insights from observability data to take a more user-centric approach and deliver exceptional end user experiences.
Reduce operational cost
Across hundreds of thousands of instances, a small percentage performance improvement in how much CPU an application uses can add up to millions of dollars in savings. Similarly, by using observability to understand and predict your future capacity needs, you can take advantage of the cost savings available from reserve and spot pricing.
Increase customer satisfaction
Elevate your customer experiences and business outcomes when you improve application, infrastructure, and network availability. Reduce downtimes and build fast, seamless digital experiences for your end customers. This allows both your internal teams and the end customers to operate efficiently to develop and deploy faster.
"Over the past year, CloudWatch Synthetics and a simple system based on Amazon CloudWatch Alarms, Amazon SES, and AWS Lambda functions have proactively allowed us to respond to our customer's application and infrastructure issues. With CloudWatch Synthetics, our DevOps and support teams have been able begin analyzing and resolving problems even before the client notifies us of the issue. CloudWatch Synthetics is a critical component of exceeding SLAs/SLOs for our customers and, ultimately, our success.”
- Steve Seaney
SVP, SaaS DevOps and Architecture, Rego Consulting
- Matt Crouch
Web Architect, Booking.com
“We were looking to consolidate all our monitoring, logging, metrics, and alerting under one tool. CloudWatch has helped us alleviate the operational burden to set up, configure, and learn third-party systems. Our teams use CloudWatch extensively to monitor error rates and status codes for multiple high-profile workloads. CloudWatch enables next-level automation and expands the capacity of each individual.”
- Emily McAfee
Platform Engineering Manager, Mapbox
“HP Print Org supports over 500 services running on Amazon Elastic Kubernetes Service (EKS). The team used self-hosted Prometheus to monitor the hardware and services metrics. As the platform grew, they struggled to keep up with the monitoring, especially maintaining the self-hosted, multi-region Prometheus setup."
- Venkat Prasad Durga
Software Design Specialist, HP Print Business
One Observability Workshop: Get hands-on experience learning a wide variety of toolsets AWS offers to setup monitoring and observability of your applications.
Viewing Amazon CloudWatch metrics with Amazon Managed Service for Prometheus and Amazon Managed Grafana.
Learn how AWS Cloud Operations is built for monitoring and operating at cloud scale.
Perform distributed tracing across multiple applications and systems to help find latency in a system and target it for improvement.