“Is my system up or down?” “Is it fast or slow as experienced by my end users?” “What KPIs and SLAs should we establish, and how do we know if they’re being met?” When you’re operating at cloud speed and scale, you can’t afford to fly blind: you need to be able to answer a wide range of operational and business questions like these. You need to be able to spot problems as they arise (ideally before they disrupt the customer experience), respond quickly, and resolve them as quickly as possible. To achieve this, you need observability into your applications and resources that work with AWS and non-AWS services.
What is observability?
“Observability” describes how well you can understand what is happening in a system, often (but not only) by instrumenting it to collect metrics, logs, or traces. There are several types of tools and activities that make a system observable, including monitoring, tracing, profiling, logs, and AI/Ops. Observability enables you to detect, investigate, and remediate problems.
In the cloud, observability can be hard to achieve due to sheer system complexity. Legacy monolithic apps are distributed across instances and often geographic locations. They may also be re-architected, becoming many microservices which rely on thousands of resources to operate, especially if they run on containers or serverless technology. Microservices may be updated frequently, scale elastically, or be invoked on demand. Thousands of components generate billions of metrics, logs, and traces in a never-ending stream of data.
Understand application health and performance to improve customer experience
The main goal of observability is to know what is going on – anywhere and everywhere – in your system so that you can ensure the best possible experience for your end users. You want to detect problems quickly, investigate them efficiently, and remediate them as soon as possible to minimize downtime and other disruptions to your customers.
Improve developer productivity
Traditional debugging – by analyzing logs, or instrumenting breakpoints into code – is tedious, repetitive, and time-consuming, and it doesn’t scale well for production applications or those built using a microservices or serverless architecture. To analyze performance across distributed applications, developers need correlated metrics and traces to identify user impact from any source, and to find broken or expensive code paths as quickly as possible. And they need to do all this without having to re-instrument their code when they want to add new observability tools to their kit.
Get more insight with visualizations
Observability, especially at cloud scale, can generate huge volumes of data that become difficult for humans to parse. Visualization tools help humans make sense of data by correlating observability data into intuitive graphic displays. However, having a bunch of graphs, charts, etc. scattered across multiple tools and displays becomes its own problem. It’s crucial to be able to centralize visual data in a single dashboard, giving you a unified view of critical information about your system and its performance.
Mapbox is an open-source mapping platform for custom-designed maps that reaches more than 300 million people each month. Mapbox uses Amazon CloudWatch for ingestion of multiple data sources—including native AWS metrics, custom metrics, and logs—as well as monitoring and visualization of key workloads and resource optimization.
“We were looking to consolidate all our monitoring, logging, metrics, and alerting under one tool. CloudWatch has helped us alleviate the operational burden to set up, configure, and learn third-party systems. Our teams use CloudWatch extensively to monitor error rates and status codes for multiple high-profile workloads. We also use CloudWatch to automate Auto Scaling actions, allowing us to optimize the cost of Amazon EC2 instance types powering our Amazon ECS clusters. CloudWatch Events enable us to provide utilization and pricing information to teams so they can audit account security, trigger AWS Lambda actions for compliance and security use cases, and schedule our resources using the cloud. CloudWatch enables next-level automation and expands the capacity of each individual.”
Emily McAfee, Platform Engineering Manager - Mapbox
Pushpay’s purpose is to bring people together by strengthening community, connection, and belonging. We build world-class giving and mobile app publishing solutions to help organizations grow their communities.
“Our current log analytics solution requires setup and maintenance overhead, has differing retention requirements, and is cost prohibitive, making it impossible for our Engineering team to be able to access and query logs in both development and test environments. With CloudWatch Logs Insights, we are now able to query logs within CloudWatch Logs reducing operational complexity. Pay per query gives us flexibility to scale at our own pace and our engineers can begin to consume and query logs without waiting for the setup, integration, and ingestion to take place with our current solution. We also benefit from viewing metrics and logs allowing faster troubleshooting. Logs Insights is an effective and in-expensive solution for our engineers to monitor their applications and perform log diving all from single AWS console.”
Peter Goodman, Director Site Reliability Engineering - Pushpay
SendGrid is a provider of cloud email and sends more than 40 billion emails each month for more than 69,000 paying customers. SendGrid adopted Amazon CloudWatch early in its migration to AWS in order to gain system visibility, operational insights, and resource optimization.
“CloudWatch allows us to collect metrics from AWS services such as Amazon EC2, Amazon Kinesis, Amazon DynamoDB, and Amazon API Gateway, as well as logs from AWS Lambda functions. We appreciated being able to integrate natively, without the need for a self-managed stack or third -party SaaS vendor. This helped us start alerting, auto scaling, and capacity planning very quickly. Being able to address our primary use cases quickly and simply made CloudWatch a preferred solution.“
Joshua Barratt, Architect II - SendGrid
Learn observability hands-on
Check out the interactive and immersive One Observability Workshop and get hands-on using Amazon CloudWatch and AWS X-Ray. In the workshop, you will deploy a complex microservices application and set up monitoring and observability in a modern environment. You will come away with a clear understanding of logging, metrics, container and serverless monitoring, and tracing techniques.
Monitoring and observability in AWS (podcast) »
Monitoring containerized applications with CloudWatch (infographic) »
Application health and performance monitoring with CloudWatch (blog post) »
Reducing alarm noise using CloudWatch composite alarms (blog post) »
How to get one view of application health with CloudWatch and X-Ray (tech talk) »
Container monitoring and anomaly detection with CloudWatch (eBooklet) »
Discover other use cases for managing and governing in AWS
Build, provision, and share resources
Audit and remediate your resource configurations
Manage your cloud operations
Establish a centrally managed, secure, multi-account AWS environment