AWS Cloud Operations Blog

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Announcing Amazon CloudWatch Container Insights with Enhanced Observability for Amazon EKS on EC2

Announcing Amazon CloudWatch Container Insights with Enhanced Observability for Amazon EKS on EC2

Amazon CloudWatch Container Insights is a fully managed monitoring and observability service that provides DevOps engineers, developers, SREs, and IT managers with out-of-the-box visibility into their containerized applications and microservice environments. With Amazon CloudWatch Container Insights, you can monitor, isolate, and diagnose issues in your Kubernetes clusters with minimal effort. It delivers infrastructure telemetry like […]

Know Before You Go — AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

Know Before You Go – AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023 in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

How to email your Amazon CloudWatch dashboard

How to email your Amazon CloudWatch dashboard

Amazon CloudWatch enables customers to collect monitoring and operational data in the form of logs, metrics, alarms, and events, thereby allowing easy workload visualization and notifications. Many customers use Amazon CloudWatch  dashboards to monitor applications and infrastructure insights in order to have a unified dashboard for monitoring. Traditionally, operational health data access was only viewable for […]

Automating Amazon EC2 Auto Scaling with Amazon CloudWatch custom metrics and AWS CDK

Automating Amazon EC2 Auto Scaling with Amazon CloudWatch custom metrics and AWS CDK

Introduction As customers migrate legacy workloads to AWS Cloud, they may need to rehost or replatform applications to Amazon EC2 servers. To benefit from the scalability of cloud, customers need to be able to scale these EC2 servers up or down, on demand and on schedule. Amazon EC2 Auto Scaling Groups provide the on-demand scaling […]

Scaling GitHub usage with AWS

Introduction Customers that migrate on-premises enterprise applications to AWS often look for guidance on how to migrate GitHub to AWS. Customers find it challenging to scale as they are constrained by on premises GitHub infrastructure. Organisations that run Github on AWS can get up and running quickly. GitHub on AWS enables teams to collaborate efficiently […]

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Customers running microservice-based workloads in a serverless environment frequently have issues with troubleshooting incidents as the data they need can be distributed across hundreds or thousands of components. In this blog post, I will demonstrate how you can reduce the mean time to resolution (MTTR, or the average time it takes to repair or mitigate […]

Unlocking the power: The keys to delivering successful Cloud Migrations

Despite the many benefits of moving to the Cloud, large enterprises frequently struggle to deliver migrations (and the related business transformation) in the planned timeframe. Why?  What are the key factors that ensure a successful migration that becomes an oft-quoted industry benchmark for a Cloud driven transformation; rather than a moribund initiative where a number […]

Self-service Account Provisioning Using AWS Service Management Connector for ServiceNow

Many customers are looking to adopt a multi-account strategy within their AWS environment. This allows customers to isolate their workloads into different environments including test, dev, and production in addition to separating workloads based on regulatory requirements. As customers scale their multi-account environments, one strategy to increase agility is to offer business units their own […]