AWS Cloud Operations Blog

Understanding AWS High Availability and Replication for vSphere Administrators

Introduction vSphere HA is a fundamental and frequently used feature of vSphere. If any of several failure scenarios occur, it restarts a virtual machine. The failure scenarios range from VM or host crashes to unresponsive hosts (for example, due to network isolation or outage). Translating vSphere High Availability (HA) to the public cloud can be […]

Configuring machine to machine Authentication with Amazon Cognito and Amazon API Gateway - Part 2

Configuring machine to machine Authentication with Amazon Cognito and Amazon API Gateway – Part 2

This blog is the second part to a 2 part series on how to secure your Amazon API Gateway with Amazon Cognito, in machine to machine (M2M) communication use cases. In the previous blog post, we dove deep into the different use cases involving M2M communication and how it contributes to business modernization, and why […]

Configuring machine to machine Authentication with Amazon Cognito and Amazon API Gateway – Part 1

Introduction When we think about modernization, we’re used to think about the process of breaking down a monolithic application, or moving to a microservices architecture. But let’s think for a moment on the business side. For example, think about the challenges and risks involved in moving information over phone calls or emails. We want to […]

Empowering Manufacturing Innovation: How AI & GenAI Centers of Excellence can drive Modernization

Introduction Technologies such as machine learning (ML), artificial intelligence (AI), and Generative AI (GenAI) unlock a new era of efficient and sustainable manufacturing while empowering the workforce. Areas where AI can be applied in manufacturing include predictive maintenance, defect detection, supply chain visibility, demand forecasting, product design, and many more. Benefits include improving uptime and […]

Optimize your cloud deployments with Prioritized Trusted Advisor recommendations in your operational workflows

AWS Trusted Advisor Priority helps you focus on the most important recommendations for optimizing your cloud deployments, improving resilience, and addressing security gaps. As an AWS Enterprise Support customer, you gain access to prioritized and context-driven recommendations, curated both by your AWS account team and machine-generated checks from AWS services. Note: AWS Trusted Advisor Priority […]

Reduce code duplication in load testing and synthetic monitoring using Amazon CloudWatch Synthetics

Load testing is an integral step in the quality assurance phase of a software development lifecycle, that offers you confidence about the performance of your workload before it is deployed to production. Once that workload moves to production, you monitor its health using synthetic monitoring. Load testing and synthetic monitoring typically test the same application […]

Use Amazon CloudWatch Contributor Insights for general analysis of Apache logs

Customers build, deploy, and maintain millions of web applications on AWS and many customers deploy these applications using the Apache web application server. Web application performance is a key metric in modern enterprise applications. On AWS customers leverage Amazon CloudWatch to monitor response times, uptime, and provide SLAs. Engineering teams that run large scale applications […]

Gain operational insights for NVIDIA GPU workloads using Amazon CloudWatch Container Insights

As machine learning models grow more advanced, they require extensive computing power to train efficiently. Many organizations are turning to GPU-accelerated Kubernetes clusters for both model training and online inference. However, properly monitoring GPU usage is critical for machine learning engineers and cluster administrators to understand model performance and to optimize infrastructure utilization. Without visibility […]

Get Disk Utilization of Your Fleet Using AWS Systems Manager Custom Inventory Types

Get Disk Utilization of Your Fleet Using AWS Systems Manager Custom Inventory Types

Some of my customers need assistance while operating their Amazon Elastic Compute Cloud (Amazon EC2) infrastructure. They need to: Review the disk usage of various volumes/ disks within an EC2 instance. To do it in a scalable way, one does not need to access the instance either through a Remote Desktop Session (RDP) or use […]

Resiliency Journey : exploring how AWS Resilience Hub and Migration Acceleration Program come together

In today’s rapidly evolving digital landscape, the cloud has become the backbone of innovation, scalability, and efficiency for businesses worldwide. As customers embark on their cloud migration journeys, whether the migration has been motivated by the intention of accelerating innovation, reducing operational and infrastructure costs, or exiting your on-prem datacenter, migrating to the cloud presents […]