AWS Cloud Operations & Migrations Blog

Category: Amazon Managed Service for Prometheus

Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics

Introduction With the rising popularity of applications hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a key challenge is handling increases in traffic and load efficiently. Traditionally, you would have to manually scale out your applications by adding more instances – an approach that’s time-consuming, inefficient, and prone to over or under provisioning. A better […]

VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus

VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus

VTEX is a multi-tenant platform with a distributed engineering operation. Observing hundreds of services in real time in an efficient manner is a technical challenge for the business. In this blog, we will show how VTEX created a resilient open source-based architecture aligned with a sharding strategy, using Amazon Managed Service for Prometheus (AMP) to […]

How Unitary achieved automatic metric collection with Amazon Managed Service for Prometheus collector

This post was co-authored with Nicolas Fournier, Platform Engineer at Unitary. Every day, over 80 years’ worth of video content is uploaded online. Some of this content can also be harmful. Unitary knows that human moderators are the current gold standard for moderation, but this manual approach does not scale. While automated systems can scale, […]

Multi-tenant monitoring across accounts and regions using Amazon Managed Service for Prometheus

Multi-tenant monitoring across accounts and regions using Amazon Managed Service for Prometheus

In this guest blog post, Nauman Noor (Managing Director), Fabio Dias (Cloud Developer), and Dylan Alibay (Cloud Developer) from the platform engineering team at State Street discuss their use of Amazon Managed Prometheus and AWS Distro for OpenTelemetry to enable monitoring in a multi-tenant, multi-account, and multi-region environment. In the ever-evolving financial services landscape, State […]

What’s new in AWS Observability at re:Invent 2023

What’s new in AWS Observability at re:Invent 2023

Let’s recap the week at AWS re:Invent 2023 with a round-up of the AWS Observability launches across Amazon CloudWatch, Amazon Managed Grafana, and Amazon Managed Service for Prometheus. From automatic instrumentation and operation of applications in CloudWatch, to agentless scraping of Prometheus metrics in Managed Service for Prometheus, read on to learn about the features […]

Monitoring and Visualizing Amazon EKS signals with Kiali and AWS managed open-source services

Microservices architecture enables scalability and agility for modern applications. However, distributed systems can introduce complexity when troubleshooting issues across services on different machines. To gain observability into microservices environments, operators need tools to monitor, analyze, and debug the interconnected services. Istio service mesh connects, secures, and observes microservices communications. It provides a way to manage […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Monitor Amazon EKS Control Plane metrics using AWS Open Source monitoring services

Have you encountered situations where your Kubernetes API calls are constantly throttled by the control plane? Did you see the 429 HTTP response code “Too many requests” all over the place and have no clue on what’s wrong with your cluster? In this blog post, we will talk about monitoring some of the key metrics […]

How to reduce Istio sidecar metric cardinality with Amazon Managed Service for Prometheus

How to reduce Istio sidecar metric cardinality with Amazon Managed Service for Prometheus

The complexity of distributed systems has grown significantly, making monitoring and observability essential for application and infrastructure reliability. As organizations adopt microservice-based architectures and large-scale distributed systems, they face the challenge of managing an increasing volume of telemetry data, particularly high metric cardinality in systems like Prometheus. To address this, many are turning to service […]

Choice Hotels adopts Amazon Managed Service for Prometheus for operational excellence and cost efficiency

This post was co-written with Stephen Cihak, Senior Director , Abhiram Madadi, Principal Engineer and Gopi Akula, Senior Manager at Choice Hotels Who is Choice Hotels? Choice Hotels International is one of the largest lodging franchisors in the world. A challenger in the upscale segment and a leader in midscale and extended stay, Choice has […]