Deploy a Machine Learning Model to a Real-Time Inference Endpoint
In this tutorial, you learn how to deploy a trained machine learning (ML) model to a real-time inference endpoint using Amazon SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) for ML that provides a fully managed Jupyter notebook interface in which you can perform end-to-end ML lifecycle tasks, including model deployment.
SageMaker offers different inference options to support a broad range of use cases:
- SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds
- SageMaker Serverless Inference for workloads with intermittent or infrequent traffic patterns
- SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times
- SageMaker batch transform to run predictions on batches of data
In this tutorial, you will use the Real-Time Inference option to deploy a binary classification XGBoost model that has already been trained on a synthetic auto insurance claims dataset. The dataset consists of details and extracted features from claims and customer tables along with a fraud column indicating whether a claim was fraudulent or otherwise. The model predicts the probability of a claim being fraudulent. You will play the role of a machine learning engineer to deploy this model and run sample inferences.
What you will accomplish
In this guide, you will:
- Create a SageMaker model from a trained model artifact
- Configure and deploy a real-time inference endpoint to serve the model
- Invoke the endpoint to run sample predictions using test data
- Attach an autoscaling policy to the endpoint to handle traffic changes
Before starting this guide, you will need:
- An AWS account: If you don’t already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
Choose the AWS CloudFormation stack link. This link opens the AWS CloudFormation console and creates your SageMaker Studio domain and a user named studio-user. It also adds the required permissions to your SageMaker Studio account. In the CloudFormation console, confirm that US East (N. Virginia) is the Region displayed in the upper right corner. Stack name should be CFN-SM-IM-Lambda-Catalog, and should not be changed. This stack takes about 10 minutes to create all the resources.
This stack assumes that you already have a public VPC set up in your account. If you do not have a public VPC, see VPC with a single public subnet to learn how to create a public VPC.
Select I acknowledge that AWS CloudFormation might create IAM resources, and then choose Create stack.
On the CloudFormation pane, choose Stacks. It takes about 10 minutes for the stack to be created. When the stack is created, the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE.