AWS Cloud Operations Blog

Phani Kumar Lingamallu

Author: Phani Kumar Lingamallu

Phani Kumar Lingamallu is a Senior Solutions Architect with Amazon Web Services. He works with AWS partners in building solutions and provide them with architectural guidance for building scalable architecture and implementing strategies to drive adoption of AWS services. He is a technology enthusiast and an author of AWS Observability handbook with core areas of interest in Cloud Operations and Observability.

Gain operational insights for NVIDIA GPU workloads using Amazon CloudWatch Container Insights

As machine learning models grow more advanced, they require extensive computing power to train efficiently. Many organizations are turning to GPU-accelerated Kubernetes clusters for both model training and online inference. However, properly monitoring GPU usage is critical for machine learning engineers and cluster administrators to understand model performance and to optimize infrastructure utilization. Without visibility […]