Posted On: Dec 1, 2021

Amazon SageMaker Serverless Inference is a new inference option that enables you to easily deploy machine learning models for inference without having to configure or manage the underlying infrastructure. Simply select the serverless option when deploying your machine learning model, and Amazon SageMaker automatically provisions, scales, and turns off compute capacity based on the volume of inference requests. With SageMaker Serverless Inference, you pay only for the duration of running the inference code and the amount of data processed, not for idle time.

Amazon SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic. For example, a chatbot service used by a payroll processing company experiences increase in inquiries at the end of month while for rest of the month traffic is intermittent. Provisioning instances for the entire month in such scenarios is not cost effective as you end up paying for idle periods. Amazon SageMaker Serverless Inference helps address these types of use cases by automatically scaling compute capacity based on the volume of inference requests without the need for you to forecast traffic demand up front or manage scaling policies. Additionally, you pay only for the compute time to run your inference code (billed in milliseconds) and amount of data processed, making it a cost-effective option for workloads with intermittent traffic. With the introduction of SageMaker Serverless Inference, SageMaker now offers four inference options, expanding the deployment choices available to a wide range of use cases. The other three options are: SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds, SageMaker Batch Transform to run predictions on batches of data, and SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times. To learn more, visit the Amazon SageMaker deployment webpage.

You can easily create a SageMaker Inference endpoint from the console, the AWS SDKs, or the AWS Command Line Interface (CLI). For detailed steps on how to get started, see the SageMaker Serverless Inference documentation, which also includes a sample notebook. For pricing information, see the SageMaker pricing page. SageMaker Serverless Inference is available in preview in US East (Northern Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Sydney).