AWS Cloud Operations & Migrations Blog

Tag: Amazon Managed Prometheus

Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics

Introduction With the rising popularity of applications hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a key challenge is handling increases in traffic and load efficiently. Traditionally, you would have to manually scale out your applications by adding more instances – an approach that’s time-consuming, inefficient, and prone to over or under provisioning. A better […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Monitor your Databricks Clusters with AWS managed open-source Services

Organizations rely heavily on cloud-based data processing and analytics platforms in today’s data-driven world to unlock valuable insights and make informed decisions. Databricks, a unified analytics platform, has emerged as a popular choice due to its seamless integration with Apache Spark, and its ability to efficiently handle large-scale data processing tasks. Many customers have implemented […]

Enhance Operational Insight by Converting the Output of any AWS SDK Commands to Prometheus Metrics

Have you ever wished you had the output of an AWS Command to enrich your dashboards or alerts? The AWS control plane contains a rich set of information that can be operationally insightful! Recently I encountered a customer running multiple Amazon Elastic Kubernetes Service (Amazon EKS) clusters in an IP constrained environment. When a subnet […]

Monitor Istio on EKS using Amazon Managed Prometheus and Amazon Managed Grafana

Service Meshes are an integral part of the Kubernetes environment that enables secure, reliable, and observable communication. Istio is an open-source service mesh that provides advanced network features without requiring any changes to the application code. These capabilities include service-to-service authentication, monitoring, and more. Istio generates detailed telemetry for all service communications within a mesh. This telemetry […]