AWS Machine Learning Blog

Category: Amazon CloudWatch

Use Amazon CloudWatch custom metrics for real-time monitoring of Amazon Sagemaker model performance

The training and learning process of deep learning (DL) models can be expensive and time consuming. It’s important for data scientists to monitor the model metrics, such as the training accuracy, training loss, validation accuracy, and validation loss, and make informed decisions based on those metrics. In this blog post, I’ll show you how to […]

Read More

Monitoring GPU Utilization with Amazon CloudWatch

Deep learning requires a large amount of matrix multiplications and vector operations that can be parallelized by GPUs (graphics processing units) because GPUs have thousands of cores. Amazon Web Services allows you to spin up P2 or P3 instances that are great for running Deep Learning frameworks such as MXNet, which emphasizes speeding up the deployment […]

Read More