Amazon Managed Service for Prometheus | AWS Cloud Operations Blog

This Month in AWS Observability: June 2026

Introduction Welcome to the latest edition of This Month in AWS Observability, featuring what’s new across Amazon CloudWatch and AI-driven operations this June! Native OpenTelemetry metrics with PromQL querying is now generally available in CloudWatch, 23 new Logs Insights commands launched for deeper statistical and structured analysis, Session Replay now in CloudWatch RUM, and AWS […]

AWS Observability ICYMI: Jan-May 2026

Welcome to the first edition of the AWS Observability ICYMI (In Case You Missed It) recap! The first five months of 2026 has been transformational for AWS observability with over 40 launches across Amazon CloudWatch, AWS X-Ray, Amazon Managed Grafana, and Amazon Managed Service for Prometheus. Two major themes defined this period: OpenTelemetry as the […]

Simplifying Prometheus metrics collection across your AWS infrastructure

If you’re running services such as Amazon EC2 instances, Amazon Elastic Container Service (Amazon ECS) containers, and Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in AWS, maintaining separate Prometheus servers for each environment creates significant operational burden. Managing scraper configurations, high availability, scaling, and security distracts you from building great applications. AWS managed […]

Alerting Best Practices with Amazon Managed Service for Prometheus

Introduction Alerts connect telemetry to action. Effective alert management helps you detect problems quickly, maintain resilience, and build customer trust. So, what is the best way to manage alerts when storing metrics in Amazon Managed Service for Prometheus? In this blog post, you will learn how to create, route, and administrate alerting rules in Amazon […]

Salesforce Commerce Cloud migrates from Self-hosted Prometheus to Amazon Managed Service for Prometheus

Introduction Salesforce Commerce Cloud empowers thousands of retailers worldwide to create seamless shopping experiences. Behind these experiences lies a complex infrastructure that demands reliable monitoring at scale. As the platform evolved from static, first-party instances to dynamic cloud-based environments, the monitoring needs outgrew the self-managed Prometheus solution. This post details Salesforce’s Commerce Cloud journey from […]

Optimizing metrics ingestion with Amazon Managed Service for Prometheus

Managing metrics collection at scale in complex cloud environments presents significant challenges for organizations, particularly when it comes to controlling costs and maintaining operational efficiency. As the volume of metrics grows exponentially with the expansion of container deployments and other cloud-native workloads, customers often struggle to balance comprehensive monitoring with resource optimization. This can lead […]

Monitor EBS Detailed Performance Statistics with Amazon Managed Service for Prometheus

Today we are excited to announce that you can now easily ingest Amazon EBS detailed performance statistics from your Amazon Elastic Kubernetes Service (Amazon EKS) workloads into an Amazon Managed Service for Prometheus workspace. We recently announced the availability of EBS detailed performance statistics, which gives you real-time visibility into the performance of your EBS […]

How Stripe architected massive scale observability solution on AWS

This post is co-written with Cody Rioux, Staff Engineer at Stripe and Michael Cowgill, Staff engineer at Stripe Stripe powers online and in-person payment processing and provides financial solutions for businesses of all sizes. Stripe operates a sophisticated microservice environment built on top of AWS. In this blog post we will cover the journey and […]

Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers

Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]

Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock

As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]

AWS Cloud Operations Blog

Category: Amazon Managed Service for Prometheus