Amazon SageMaker Serverless Inference is now generally available

Posted On: Apr 21, 2022

Today, we are excited to announce general availability of Amazon SageMaker Serverless Inference in all AWS Regions where SageMaker is generally available (except the AWS China regions). With SageMaker Serverless Inference, you can quickly deploy machine learning (ML) models for inference without having to configure or manage the underlying infrastructure. When deploying your ML models, simply select the serverless option and Amazon SageMaker automatically provisions, scales, and turns off compute capacity based on the volume of inference requests. With SageMaker Serverless Inference, you pay only for the compute capacity used to process inference requests (billed by the millisecond) and the amount of data processed; you do not pay for idle time. SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic.

Since the preview launch at re:Invent 2021, we have added support for Amazon SageMaker Python SDK, which offers abstractions to simplify model deployment, and support for Model Registry, which allows you to integrate your serverless inference endpoints with your MLOps workflow. We have also increased maximum concurrent invocations per endpoint limit to 200 (from 50 during preview), allowing you to use SageMaker Serverless Inference for high-traffic workloads.

You can create a SageMaker Serverless Inference endpoint from the AWS console, AWS SDK for Python (Boto3), SageMaker Python SDK, AWS CloudFormation, or the AWS Command Line Interface (AWS CLI). SageMaker Serverless Inference is now generally available in the following 21 AWS Regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Milan), Europe (Paris), Europe (Stockholm), Middle East (Bahrain), and South America (São Paulo).

Get started:

Check out the Amazon SageMaker Serverless Inference blog post
Refer to the SageMaker Serverless Inference developer guide
Download the SageMaker Serverless Inference sample notebooks on GitHub
Visit the Amazon SageMaker Pricing page

Amazon SageMaker Serverless Inference is now generally available

Ending Support for Internet Explorer