Posted On: Dec 1, 2021

Amazon SageMaker Inference Recommender helps you choose the best available compute instance and configuration to deploy machine learning models for optimal inference performance and cost.

Selecting a compute instance with the best price performance for deploying machine learning models is a complicated, iterative process that can take weeks of experimentation. First, you need to choose the right ML instance type out of over 70 options based on the resource requirements of your models and the size of the input data. Next, you need to optimize the model for the selected instance type. Lastly, you need to provision and manage infrastructure to run load tests and tune cloud configuration for optimal performance and cost. All this can delay model deployment and time to market.

Amazon SageMaker Inference Recommender automatically selects the right compute instance type, instance count, container parameters, and model optimizations for inference to maximize performance and minimize cost. You can use SageMaker Inference Recommender from SageMaker Studio, the AWS Command Line Interface (CLI), or the AWS SDK, and within minutes, get recommendations to deploy your ML model. You can then deploy your model to one of the recommended instances or run a fully managed load test on a set of instance types you choose without worrying about testing infrastructure. You can review the results of the load test in SageMaker Studio and evaluate the tradeoffs between latency, throughput, and cost to select the most optimal deployment configuration.

Amazon SageMaker Inference Recommender is generally available in all regions where SageMaker is available except the AWS China regions. To learn more, see the SageMaker model deployment webpage and the SageMaker Inference Recommender documentation.