AWS Machine Learning Blog

Category: Management Tools

Use Amazon CloudWatch custom metrics for real-time monitoring of Amazon Sagemaker model performance

The training and learning process of deep learning (DL) models can be expensive and time consuming. It’s important for data scientists to monitor the model metrics, such as the training accuracy, training loss, validation accuracy, and validation loss, and make informed decisions based on those metrics. In this blog post, I’ll show you how to […]

AWS CloudTrail integration is now available in Amazon SageMaker

AWS customers have been requesting a way to log activity in Amazon SageMaker, to help you meet your governance and compliance needs. I’m happy to announce that Amazon SageMaker is now integrated with AWS CloudTrail, a service that enables you to log, continuously monitor, and retain account information related to Amazon SageMaker API activity. Amazon […]

Monitoring GPU Utilization with Amazon CloudWatch

Deep learning requires a large amount of matrix multiplications and vector operations that can be parallelized by GPUs (graphics processing units) because GPUs have thousands of cores. Amazon Web Services allows you to spin up P2 or P3 instances that are great for running Deep Learning frameworks such as MXNet, which emphasizes speeding up the deployment […]