AWS Cloud Operations Blog

Author: Vikram Venkataraman

Vikram Venkataraman is a Principal Solution Architect at Amazon Web Services and also a container enthusiast. He helps organization with best practices for running workloads on AWS. In his spare time, he loves to play with his two kids and follows Cricket.

Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock

As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]

Using Amazon Q Business to streamline your operations

Amazon Q, is a new generative artificial intelligence- (AI)-powered assistant designed for work that can be tailored to your business. You can use Amazon Q to have conversations, solve problems, generate content, gain insights, and take action by connecting to your company’s information repositories, code, data, and enterprise systems. Amazon Q provides immediate, relevant information […]

Announcing Amazon CloudWatch Container Insights for Amazon EKS Windows Workloads Monitoring

Monitoring containerized applications requires precision and efficiency. As your applications scale, collecting and summarizing application and infrastructure metrics from your applications can be challenging. One way to handle this challenge is using Amazon CloudWatch Container Insights which is a single-click native monitoring tool provided by AWS. Amazon CloudWatch Container Insights helps customers collect, aggregate, and summarize […]

Monitor Amazon EKS Control Plane metrics using AWS Open Source monitoring services

Have you encountered situations where your Kubernetes API calls are constantly throttled by the control plane? Did you see the 429 HTTP response code “Too many requests” all over the place and have no clue on what’s wrong with your cluster? In this blog post, we will talk about monitoring some of the key metrics […]

Choice Hotels adopts Amazon Managed Service for Prometheus for operational excellence and cost efficiency

This post was co-written with Stephen Cihak, Senior Director , Abhiram Madadi, Principal Engineer and Gopi Akula, Senior Manager at Choice Hotels Who is Choice Hotels? Choice Hotels International is one of the largest lodging franchisors in the world. A challenger in the upscale segment and a leader in midscale and extended stay, Choice has […]

Monitoring CoreDNS for DNS throttling issues using AWS Open source monitoring services

Monitoring Infrastructure and Application is essential today as it provides important information to the operations engineers to ensure the technology stack runs healthy to achieve the business outcomes. To build a microservices environment using container orchestration tool like Kubernetes, which is designed to increase flexibility and agility, there are many distributed parts that have to […]

Announcing AWS Observability Accelerator to configure comprehensive observability for Amazon EKS

In May 2022, we announced Amazon EKS Observability Accelerator, a tool for configuring and deploying a purpose built observability solution on Amazon Elastic Kubernetes Service (Amazon EKS) clusters for specific workloads using Terraform modules. We launched this tool demonstrating four use-cases and customers have been using the tool rapidly to achieve observability. Customers can use […]

Introducing Amazon EKS Observability Accelerator

Some of the details in this blog post are now outdated. For the latest information on the AWS Observability Accelerator please see Announcing AWS Observability Accelerator to configure comprehensive observability for Amazon EKS. Also explore the GitHub repository where you can find more details on how to get started. Observability is critical for any application […]

Proactive autoscaling of Kubernetes workloads with KEDA using metrics ingested into Amazon Managed Service for Prometheus

UPDATE: This blog post has been published to include information about the recently added support for KEDA with the Amazon Managed Service for Prometheus (AMP).” Orchestration platforms such as Amazon EKS and Amazon ECS have simplified the process of building, securing, operating, and maintaining container-based applications, thereby helping organizations focus on building applications. We simplified this further […]

Using Prometheus Adapter to autoscale applications running on Amazon EKS

Automated scaling is an approach to scaling up or down workloads automatically based on resource usage. In Kubernetes, the Horizontal Pod Autoscaler (HPA) can scale pods based on observed CPU utilization and memory usage. In more complex scenarios, we would account for other metrics before deciding the scaling. For example, most web and mobile backends […]