AWS Cloud Operations Blog

Category: Amazon Elastic Kubernetes Service

Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock

As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]

Enhancing observability with a managed monitoring solution for Amazon EKS

Enhancing observability with a managed monitoring solution for Amazon EKS

Introduction Keeping a watchful eye on your Kubernetes infrastructure is crucial for ensuring optimal performance, identifying bottlenecks, and troubleshooting issues promptly. In the ever-evolving world of cloud-native applications, Amazon Elastic Kubernetes Service (EKS) has emerged as a popular choice for deploying and managing containerized workloads. However, monitoring Kubernetes clusters can be challenging due to their […]

How to automate application log ingestion from Amazon EKS on Fargate into AWS CloudTrail Lake

How to automate application log ingestion from Amazon EKS on Fargate into AWS CloudTrail Lake

Customers often look for options to capture and centralized storage of application logs from Amazon Elastic Kubernetes Service on Fargate (Amazon EKS on Fargate) Pods to investigate root causes or analyze security incidents. Customers also like the capability to easily query the logs to assist with security investigations. In this blog post, we show you […]

How StormForge reduces complexity and ensures scalability with Amazon Managed Service for Prometheus

This blog post was co-written by Brent Eager, Senior Software Engineer, StormForge StormForge is the creator of Optimize Live, a Kubernetes vertical rightsizing solution that is compatible with the Kubernetes HorizontalPodAutoscaler (HPA). Using cluster-based agents, machine learning, and Amazon Managed Service for Prometheus, Optimize Live is able to continuously calculate and apply optimal resource requests, […]

Unlocking Insights: Turning Application Logs into Actionable Metrics

Modern software development teams understand the importance of observability as a critical aspect of building reliable and resilient applications. By implementing observability practices, software teams can proactively identify issues, uncover performance bottlenecks, and enhance system reliability. However, it is a fairly recent trend and still lacks industry-wide adoption. As organizations standardize on containers, they often […]

Announcing Amazon CloudWatch Container Insights for Amazon EKS Windows Workloads Monitoring

Monitoring containerized applications requires precision and efficiency. As your applications scale, collecting and summarizing application and infrastructure metrics from your applications can be challenging. One way to handle this challenge is using Amazon CloudWatch Container Insights which is a single-click native monitoring tool provided by AWS. Amazon CloudWatch Container Insights helps customers collect, aggregate, and summarize […]

Enhance Kubernetes Operational Visibility with AWS Chatbot

Many customers run their mission critical container workloads on Amazon Web Services (AWS)  using Amazon Elastic Kubernetes Service (Amazon EKS). One of the key focus areas for them is to analyze and act on operational events quickly. Getting real-time visibility into performance issues, traffic spikes and infrastructure events can enable teams to quickly address issues and […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Announcing Amazon CloudWatch Container Insights with Enhanced Observability for Amazon EKS on EC2

Announcing Amazon CloudWatch Container Insights with Enhanced Observability for Amazon EKS on EC2

Amazon CloudWatch Container Insights is a fully managed monitoring and observability service that provides DevOps engineers, developers, SREs, and IT managers with out-of-the-box visibility into their containerized applications and microservice environments. With Amazon CloudWatch Container Insights, you can monitor, isolate, and diagnose issues in your Kubernetes clusters with minimal effort. It delivers infrastructure telemetry like […]

Monitor Amazon EKS Control Plane metrics using AWS Open Source monitoring services

Have you encountered situations where your Kubernetes API calls are constantly throttled by the control plane? Did you see the 429 HTTP response code “Too many requests” all over the place and have no clue on what’s wrong with your cluster? In this blog post, we will talk about monitoring some of the key metrics […]