AWS Cloud Operations Blog
Category: Monitoring and observability
Top 10 AWS Cloud Operations and Migrations Blog posts of 2022
With 2022 behind us, we want to take the opportunity to highlight our readers and the top blog posts from 2022. A big thank you to all our readers but also our authors who continue to work on delighting our customers with their blog posts. #1 Announcing AWS CloudTrail Lake – a managed audit and […]
Monitoring the status of Windows services with Amazon CloudWatch
When you have an application that relies on a specific Windows service being up and running, knowing the status of this service can be a useful part of your observability solution. This service status data can be displayed on dashboards, used to create alarms, or used to trigger automated resolutions. This post presents a solution […]
Visualizing Amazon CloudWatch Costs – Part 2 – Where does the data come from?
In part 1 of this series we explored an Amazon CloudWatch dashboard which provides a real-time view of some of the typical main contributors to CloudWatch costs. In this second post, we’ll look at how the CloudWatch dashboard widgets were created so that you can learn how to create something similar, or modify the widgets […]
Visualizing Amazon CloudWatch Costs – Part 1
Amazon CloudWatch monitors your AWS resources and the applications you run on AWS in real-time. You can use CloudWatch to collect metrics, logs, traces, set up alarms, create synthetic checks, and more. The information you collect lets you observe, validate, and alert on areas of interest to you. In this two-part post, we’ll explore a […]
Know Before You Go – AWS re:Invent 2022 Monitoring & Observability
Whether you are building out applications in the cloud, modernizing your environment, or migrating workloads, observability is vital to your success. Monitoring and observability provide operational visibility and insight into your workloads and are crucial to operational excellence. AWS Observability will be at re:Invent 2022 to share how you can leverage observability for your organization. […]
How to Monitor Databricks with Amazon CloudWatch
This post was written by Lei Pan and Sajith Appukuttan from Databricks. In this post, we look closely at monitoring and alerting systems – both critical components of any production-level environment. We’ll start with a review of the key reasons why engineers should build a monitoring/alerting system for their environment, the benefits, as well as […]
Deploy Multi-Account Amazon CloudWatch Dashboards
Organizations building modern applications require a way to gain actionable insights into their Amazon Elastic Compute Cloud (Amazon EC2) workloads. Amazon CloudWatch is a monitoring and observability service that collects operational data from logs, metrics, and events. The service lets customers monitor your resources spread across different accounts or regions in a single view, visualize […]
Viewing custom metrics from statsd with Amazon Managed Service for Prometheus and Amazon Managed Grafana
Monitoring applications based on custom metrics is important for a resilient system. One of the mechanisms to generate custom metrics from applications is statsd – a NodeJs process to collect custom application performance metrics periodically. However, statsd doesn’t provide long-term storage, rich querying, visualization, or an alerting solution. Amazon Managed Service for Prometheus and Amazon […]
Viewing collectd statistics with Amazon Managed Service for Prometheus and Amazon Managed Service for Grafana
Monitoring systems are essential for a resilient solution. A popular tool to monitor Linux-based physical or virtual machines is collectd – a daemon to collect system and application performance metrics periodically. However, collectd doesn’t provide long-term storage for metrics, rich querying, visualization, or an alerting solution. The Amazon Managed Service for Prometheus is a serverless […]
What is observability and Why does it matter? – Part 1
Before defining observability, consider the following example: You run an e-commerce site, and you’re interested in understanding the customer experience of the site, as well as how that translates into sales. You have identified that long page-loading times lead to poor customer experience, which in turn leads customers to abandon their carts and buy competing […]