AWS Cloud Operations & Migrations Blog

Tag: Monitoring

Using Tag-Based Filtering to Manage AWS Health Monitoring and Alerting at Scale

AWS provides customers regular updates of service notifications and planned activities via e-mail to the root account owners or the operational, security and billing contacts. AWS also provides granular notifications to customers via AWS Health allowing them to fine-tune their alerts on issues relating directly to them. Alongside Health Dashboard’s monitoring capabilities, customers can also […]

Detecting gray failures with outlier detection in Amazon CloudWatch Contributor Insights

You may have encountered a situation in the past where a single user or small subset of users of your system are reporting an event that is impacting their experience, but your observability systems didn’t show any clear impact. The discrepancy between the customer’s experience and the system’s observation of its health is referred to […]

Setup memory metrics for Amazon EC2 instances using AWS Systems Manager

Amazon Elastic Compute Cloud (Amazon EC2) emits several metrics for your EC2 instance to Amazon CloudWatch. However, memory metrics isn’t one of the default metrics provided by Amazon EC2. Several memory heavy applications like Big Data Analytics, In-memory Databases, Real-time Streaming require you to monitor memory utilization on the instances for operational visibility. These applications […]

Simplify analysis of AWS CloudTrail data leveraging Amazon CloudWatch machine learning and advanced capabilities

AWS CloudTrail tracks user and API activities across AWS environments for governance and auditing purposes and allows customers to centralize a record of these activities. Customers have the option to send AWS CloudTrail logs to Amazon CloudWatch that simplifies and streamlines the analysis and monitoring of AWS CloudTrail recorded activities. Amazon CloudWatch anomaly detection allows […]

Announcing Amazon Managed Grafana workspace version selection with version 9.4 support

Many customers that use Amazon Managed Grafana have requested for the ability to choose a Grafana version with the latest product features including navigation, dashboards, and visualizations. Today, we are announcing Amazon Managed Grafana workspace version selection with version 9.4 support. Since the product was launched, Amazon Managed Grafana maintained a single version offering globally. […]

How Audible used Amazon CloudWatch cross-account observability to resolve severity tickets faster

This blog was co-written with Audible’s Apurva Jatakia, Kaushik S., and David Etler. Audible’s consumption services platform serves thousands of requests every second, and each incoming request is served by a distributed set of microservices owned by different teams. An Audible team, in charge of a platform called Stagg, is responsible for five separate microservices. […]

Announcing inbound network access control in Amazon Managed Grafana

Many customers that use Amazon Managed Grafana have a need to restrict the Grafana workspace public access and enable fine-grained control to allow which traffic sources can reach the Grafana workspace. Today, we are announcing Amazon Managed Grafana’s new feature that supports inbound network access control. This enables you to secure Grafana workspaces using VPC […]

Title of blog on box image

What’s new in AWS Observability at re:Invent 2022

Kick off your AWS re:Invent 2022 week with a round-up of the AWS Observability launches across Amazon CloudWatch, AWS X-Ray, Amazon Managed Grafana, and Amazon Managed Service for Prometheus. From understanding impact of internet issues on your application performance and availability with CloudWatch, to VPC support and Prometheus alerting in Managed Grafana, read on to […]

Viewing custom metrics from statsd with Amazon Managed Service for Prometheus and Amazon Managed Grafana

Monitoring applications based on custom metrics is important for a resilient system. One of the mechanisms to generate custom metrics from applications is statsd – a NodeJs process to collect custom application performance metrics periodically. However, statsd doesn’t provide long-term storage, rich querying, visualization, or an alerting solution. Amazon Managed Service for Prometheus and Amazon […]