Posted On: Apr 10, 2023

Amazon SageMaker Inference Recommender (IR) helps customers select the best instance type and configuration (such as instance count, container parameters, and model optimizations) for deploying their ML models on SageMaker. Today, we are announcing deeper integration with Amazon CloudWatch for logs and metrics, python SDK support for running IR jobs, enabling customers to run IR jobs within a VPC subnet of their choice, support for running load tests on existing endpoint via a new API, and several usability improvements for easily getting started with IR.

CloudWatch integration provides IR logs under a new log group for identifying any errors with IR execution. Now IR also publishes key metrics such as concurrent users, CPU and memory utilization at P99 latency, besides throughput and latency. Python SDK support lets customers trigger an IR job from jupyter notebooks to get instance type recommendations. We also launched new APIs that provide detailed visibility into all execution steps of IR job and an option to load test the model against an existing endpoint. To improve usability, we made several mandatory input parameters optional and customers are no longer required to register a model or provide inputs such as domain etc to run an IR job.

For more information about the AWS Regions where SageMaker Inference is available, see the AWS Region table. 

To learn more, visit the Inference Recommender documentation. Amazon SageMaker Inference Recommender only charges you for the underlying resources used. For more information on how to deploy models with SageMaker see the documentation.