AWS Cloud Operations Blog

Tag: Monitoring

Optimize AWS Resource Management with Tag Inventory Reports leveraging AWS Resource Explorer

Customers are increasingly seeking an efficient solution to manage their expanding AWS resources, spanning AWS accounts and Regions, amidst changes like mergers, acquisitions, and cloud migrations. AWS Tags offer an effective solution for organizing, identifying, and filtering resources by categorizing them based on criteria such as purpose, owner, or environment. AWS customers would like to […]

Multi-tenant monitoring across accounts and regions using Amazon Managed Service for Prometheus

Multi-tenant monitoring across accounts and regions using Amazon Managed Service for Prometheus

In this guest blog post, Nauman Noor (Managing Director), Fabio Dias (Cloud Developer), and Dylan Alibay (Cloud Developer) from the platform engineering team at State Street discuss their use of Amazon Managed Prometheus and AWS Distro for OpenTelemetry to enable monitoring in a multi-tenant, multi-account, and multi-region environment. In the ever-evolving financial services landscape, State […]

Automate the creation of AWS Support cases using Amazon CloudWatch alarms and Amazon Bedrock

Automate the creation of AWS Support cases using Amazon CloudWatch alarms and Amazon Bedrock

For production applications, the Mean-Time-To-Recovery (MTTR) is critical. In line with this, AWS offers Business, Enterprise On-Ramp and Enterprise support plans where AWS customers can benefit from shorter response time for cases related to production and business critical workloads. However, without having an automated way to notify AWS support, creating a case is a manual […]

Using Tag-Based Filtering to Manage AWS Health Monitoring and Alerting at Scale

AWS provides customers regular updates of service notifications and planned activities via e-mail to the root account owners or the operational, security and billing contacts. AWS also provides granular notifications to customers via AWS Health allowing them to fine-tune their alerts on issues relating directly to them. Alongside Health Dashboard’s monitoring capabilities, customers can also […]

Detecting gray failures with outlier detection in Amazon CloudWatch Contributor Insights

You may have encountered a situation in the past where a single user or small subset of users of your system are reporting an event that is impacting their experience, but your observability systems didn’t show any clear impact. The discrepancy between the customer’s experience and the system’s observation of its health is referred to […]

Setup memory metrics for Amazon EC2 instances using AWS Systems Manager

Amazon Elastic Compute Cloud (Amazon EC2) emits several metrics for your EC2 instance to Amazon CloudWatch. However, memory metrics isn’t one of the default metrics provided by Amazon EC2. Several memory heavy applications like Big Data Analytics, In-memory Databases, Real-time Streaming require you to monitor memory utilization on the instances for operational visibility. These applications […]

Simplify analysis of AWS CloudTrail data leveraging Amazon CloudWatch machine learning and advanced capabilities

AWS CloudTrail tracks user and API activities across AWS environments for governance and auditing purposes and allows customers to centralize a record of these activities. Customers have the option to send AWS CloudTrail logs to Amazon CloudWatch that simplifies and streamlines the analysis and monitoring of AWS CloudTrail recorded activities. Amazon CloudWatch anomaly detection allows […]

Announcing Amazon Managed Grafana workspace version selection with version 9.4 support

Many customers that use Amazon Managed Grafana have requested for the ability to choose a Grafana version with the latest product features including navigation, dashboards, and visualizations. Today, we are announcing Amazon Managed Grafana workspace version selection with version 9.4 support. Since the product was launched, Amazon Managed Grafana maintained a single version offering globally. […]

How Audible used Amazon CloudWatch cross-account observability to resolve severity tickets faster

This blog was co-written with Audible’s Apurva Jatakia, Kaushik S., and David Etler. Audible’s consumption services platform serves thousands of requests every second, and each incoming request is served by a distributed set of microservices owned by different teams. An Audible team, in charge of a platform called Stagg, is responsible for five separate microservices. […]