Deploy a Multi-Model Endpoint to a Real-Time Inference

TUTORIAL

Overview

In this tutorial, you learn how to deploy a multi-model endpoint with multiple trained machine learning models to a single real-time inference using an Amazon SageMaker notebook instance.

SageMaker Studio is an integrated development environment (IDE) for machine learning (ML) that provides a fully managed Jupyter notebook interface in which you can perform end-to-end ML lifecycle tasks, including model deployment.

SageMaker offers different inference options to support a broad range of use cases:

SageMaker Real-Time Inference for workloads with low latency requirements on the order of milliseconds.
SageMaker Serverless Inference for workloads with intermittent or infrequent traffic patterns.
SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times.
SageMaker Batch Transform to run predictions on batches of data.

In this tutorial, you will use the SageMaker Real-Time Inference option to deploy a set of binary classification XGBoost models that has already been trained on a synthetic house price prediction dataset. The dataset consists of details on house prices based on features such as number of bedrooms, square feet, and number of bathrooms. These models each predicts housing prices for a single location. You will play the role of a machine learning engineer to deploy these models and run sample inferences.

What you will accomplish

In this guide, you will:

Create multiple SageMaker models from respective trained model artifacts
Configure and deploy a real-time endpoint to serve these models
Invoke the multi-model endpoint to run sample predictions using test data

Prerequisites

Before starting this guide, you will need:

An AWS account: If you don’t already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.

AWS experience

Intermediate

Time to complete

15 minutes

Cost to complete

See SageMaker pricing to estimate cost for this tutorial

Requires

You must be logged in to an AWS account

Services used

Amazon SageMaker Real-time Inference

Amazon SageMaker Notebook

Last updated

March 7, 2022

Implementation

Step 1: Set up Amazon SageMaker Studio domain

With Amazon SageMaker, you can deploy models visually using the console or programmatically using either SageMaker Studio or SageMaker notebooks. In this tutorial, you deploy the models programmatically using a SageMaker Studio notebook, which requires a SageMaker Studio domain.

An AWS account can have multiple Studio domains per AWS Region. If you already have a SageMaker Studio domain in the US East (N. Virginia) Region, follow the SageMaker Studio Set Up guide to attach the required AWS IAM policies to your SageMaker Studio account, then skip Step 1, and proceed directly to Step 2.

If you don't have an existing SageMaker Studio domain, continue with Step 1 to run an AWS CloudFormation template that creates a SageMaker Studio domain and adds the permissions required for the rest of this tutorial.

1.1 - Choose the AWS CloudFormation stack link. This link opens the AWS CloudFormation console and creates your SageMaker Studio domain and a user named studio-user. It also adds the required permissions to your SageMaker Studio account. In the CloudFormation console, confirm that US East (N. Virginia) is the Region displayed in the upper right corner. Stack name should be CFN-SM-IM-Lambda-Catalog, and should not be changed. This stack takes about 10 minutes to create all the resources.

1.2 - This stack assumes that you already have a public VPC set up in your account. If you do not have a public VPC, see VPC with a single public subnet to learn how to create a public VPC.

1.3 - Select I acknowledge that AWS CloudFormation might create IAM resources, and then choose Create stack.

1.4 - On the CloudFormation pane, choose Stacks. It takes about 10 minutes for the stack to be created. When the stack is created, the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE.

Step 2: Set up a SageMaker Studio notebook

In this step, you launch a new SageMaker Studio notebook instance, install the necessary open-source libraries, and configure the SageMaker variables required to fetch the trained model artifacts from Amazon S3. But since the model artifact cannot be directly deployed for inference, you need to first create SageMaker models from the model artifacts. The created models will contain the training and inference code that SageMaker will use for model deployment.

2.1 – Enter SageMaker Studio into the console search bar, and then choose SageMaker Studio.

2.2 – Choose US East (N. Virginia) from the Region dropdown list on the upper right corner of the SageMaker console. Select Studio from the left navigation pane to open the SageMaker Studio using the studio-user profile.

2.3 – Open the SageMaker Studio interface. On the navigation bar, choose File, New, Notebook.

2.4 – In the Set up notebook environment dialog box, under Image, select Data Science. The Python 3 kernel is selected automatically. Choose Select.

2.5 – The kernel on the top right corner of the notebook should now display Python 3 (Data Science).

2.6 - Copy and paste the following code snippet into a cell in the notebook, and press Shift+Enter to run the current cell to update the aiobotocore library, which is an API to interact with many of the AWS services. Ignore any warnings to restart the kernel or any dependency conflict errors.

%pip install --upgrade -q aiobotocore

2.7 - You also need to instantiate the S3 client object and the locations inside your default S3 bucket, where the model artifacts are uploaded. To do this, copy and paste the following code in to a cell in the notebook and run it. The model artifacts are stored in a public S3 bucket named sagemaker-sample-mme-files, which has been specified as the read bucket in line 29. The location inside the bucket is specified through the model prefix.

import boto3
import sagemaker
import time

from sagemaker.image_uris import retrieve
from time import gmtime, strftime
from sagemaker.amazon.amazon_estimator import image_uris

sagemaker_session = sagemaker.Session()
default_bucket = sagemaker_session.default_bucket()
write_prefix = "housing-prices-prediction-mme-demo"

region = sagemaker_session.boto_region_name
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")
role = sagemaker.get_execution_role()

# S3 locations used for parameterizing the notebook run
read_bucket = "sagemaker-sample-files"
read_prefix = "models/house_price_prediction"
model_prefix = "models/xgb-hpp"

# S3 location of trained model artifact
model_artifacts = f"s3://{default_bucket}/{model_prefix}/"

# Location
location = ['Chicago_IL', 'Houston_TX', 'NewYork_NY', 'LosAngeles_CA']

test_data = [1997, 2527, 6, 2.5, 0.57, 1]

2.8 - You need to make a copy of the model artifacts from the public S3 bucket to the session created S3 bucket, which has been specified as the default bucket in line 17. To do this, copy and paste following code in to a cell in the notebook and run it.

for i in range (0,4):
    copy_source = {'Bucket': read_bucket, 'Key': f"{read_prefix}/{location[i]}.tar.gz"}
    bucket = s3.Bucket(default_bucket)
    bucket.copy(copy_source, f"{model_prefix}/{location[i]}.tar.gz")

Step 3: Create a Real-Time Inference endpoint

In SageMaker, you can deploy a trained model to a Real-Time Inference endpoint using either the: SageMaker SDK, AWS SDK - Boto3, or the SageMaker console. For more information, see Deploy Models for Inference in the Amazon SageMaker Developer Guide. SageMaker SDK has more abstractions compared to the AWS SDK - Boto3 with the latter exposing lower-level APIs for greater control over model deployment. In this tutorial, you deploy the model using the AWS SDK -Boto3. To deploy a model, you need to follow these three steps in:

Create a SageMaker model from the model artifact
Create an endpoint configuration to specify properties, including instance type and count
Create the endpoint using the endpoint configuration

3.1 – To create a SageMaker model using the trained model artifacts stored in Amazon S3, copy and paste the following code. The create_model method takes the Docker container containing the training image (for this model, the XGBoost container), the Amazon S3 location of the model artifacts, and the execution role as parameters. Note below, the primary_container takes a parameter “Mode” as “MultiModel”.

# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")

# Specify an unique model name that does not exist
model_name = "housing-prices-prediction-mme-xgb"
primary_container = {
                     "Image": training_image,
                     "ModelDataUrl": model_artifacts,
                     "Mode": "MultiModel"
                    }

model_matches = sm_client.list_models(NameContains=model_name)["Models"]
if not model_matches:
    model = sm_client.create_model(ModelName=model_name,
                                   PrimaryContainer=primary_container,
                                   ExecutionRoleArn=role)
else:
    print(f"Model with name {model_name} already exists! Change model name to create new")

3.2 – You can check the created model in the SageMaker console under the Models section.

3.3 – After the SageMaker model is created, copy and paste the following code to use the Boto3 create_endpoint_config method to configure the endpoint. The main inputs to the create_endpoint_config method are the endpoint configuration name and variant information, such as inference instance type and count, the name of the model to be deployed, and the traffic share the endpoint should handle.

# Endpoint Config name
endpoint_config_name = f"{model_name}-endpoint-config"

# Create endpoint if one with the same name does not exist
endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)["EndpointConfigs"]
if not endpoint_config_matches:
    endpoint_config_response = sm_client.create_endpoint_config(
                                                                EndpointConfigName=endpoint_config_name,
                                                                ProductionVariants=[
                                                                    {
                                                                        "InstanceType": "ml.m5.xlarge",
                                                                        "InitialInstanceCount": 1,
                                                                        "InitialVariantWeight": 1,
                                                                        "ModelName": model_name,
                                                                        "VariantName": "AllTraffic",
                                                                    }
                                                                ],
                                                                )
else:
    print(f"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new")

3.4 - You can check the created endpoint configuration in the SageMaker console under the Endpoint configurations section.

3.5 - Copy and paste the following code to create the endpoint. The create_endpoint method takes the endpoint configuration as a parameter, and deploys the model specified in the endpoint configuration to a compute instance. It takes about 6 minutes to deploy the model.

# Endpoint name
endpoint_name = f"{model_name}-endpoint"

endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)["Endpoints"]
if not endpoint_matches:
    endpoint_response = sm_client.create_endpoint(
                                                  EndpointName=endpoint_name,
                                                  EndpointConfigName=endpoint_config_name
                                                 )
else:
    print(f"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new")

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
    print(f"Endpoint Status: {status}...")
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")

3.6 - To check the status of the endpoint, select Endpoints from the left navigation menu in the SageMaker console. The Status column shows the status of the endpoint.

Step 4: Invoke the Inference endpoint

After the endpoint status changes to InService, you can invoke the endpoint using the REST API, AWS SDK - Boto3, SageMaker Studio, AWS CLI, or SageMaker Python SDK. In this tutorial, you invoke the endpoint by sending the sample from a test dataset. To invoke the endpoint and get prediction results, copy and paste the following code.

# converting the elements in test data to string
payload = ' '.join([str(elem) for elem in test_data])

for i in range (0,4):
    predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"{location[i]}.tar.gz", ContentType="text/csv", Body=payload)
    print(f"Predicted Value for {location[i]} target model:\n ${predicted_value['Body'].read().decode('utf-8')}")

4.1 - After the execution completes, the cell returns the model predictions for the test sample done by all four models deployed on the endpoint.

4.2 - To monitor the endpoint invocation metrics using Amazon CloudWatch, open the SageMaker console. Under Inference, select Endpoints, housing-prices-prediction-mme-xgb-endpoint.

4.3 - On the Endpoint details page, under Monitor, choose View invocation metrics. Initially, you might see only a single dot in the metrics chart. But after multiple invocations, you will see a line similar to the one in the sample screenshot.

4.4 - The Metrics page shows multiple endpoint performance metrics. You can choose different time periods, such as over 1 hour or 3 hours, to visualize the endpoint performance. Select any metric to see its trend over the chosen time period.

Step 5: Clean up resources

It is a best practice to delete resources that you are no longer using so that you don't incur unintended charges.

5.1 – Delete the model, endpoint configuration, and endpoint you created in this tutorial by running the following code block in your notebook. If you do not delete the endpoint, your account will keep accumulating charges for the compute instance running at the endpoint.

# Delete model
sm_client.delete_model(ModelName=model_name)

# Delete endpoint configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# Delete endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)

5.2 – To delete the S3 bucket:

Open the Amazon S3 console. On the navigation bar, choose Buckets, sagemaker-<your-Region>-<your-account-id>, and then select models/xgb-hpp the checkbox next to housing-prices-prediction-demo. Then, choose Delete.
On the Delete objects dialog box, verify that you have selected the proper object to delete and enter permanently delete into the Permanently delete objects confirmation box.
Once this is complete and the bucket is empty, you can delete the sagemaker-<your-Region>-<your-account-id> bucket by following the same procedure again.

5.3 – The Data Science kernel used for running the notebook image in this tutorial will accumulate charges until you either stop the kernel or perform the following steps to delete the apps. For more information, see Shut Down Resources in the Amazon SageMaker Developer Guide.

To delete the SageMaker Studio apps, do the following: On the SageMaker Studio console, choose studio-user, and then delete all the apps listed under Apps by choosing Delete app. Wait until the Status changes to Deleted.

If you used an existing SageMaker Studio domain in Step 1, skip the rest of Step 5 and proceed directly to the conclusion section.

If you ran the CloudFormation template in Step 1 to create a new SageMaker Studio domain, continue with the following steps to delete the domain, user, and the resources created by the CloudFormation template.

5.4 – Open the CloudFromation console. In the CloudFormation pane, choose Stacks. From the status dropdown list, select Active. Under Stack name, choose CFN-SM-IM-Lambda-catalog to open the stack details page.

5.5 – On CFN-SM-IM-Lambda-catalog stack details page, choose Delete to delete the stack along with the resources it created in Step 1.

Conclusion

Congratulations! You have finished the “Deploy a Multi-Model Endpoint to a Real-Time Inference Endpoint” tutorial.

In this tutorial, you deployed multiple SageMaker models to a real-time inference endpoint. You used the AWS SDK - Boto3 API to invoke the endpoint and test it by running sample inferences for multiple models.

You can continue your machine learning journey with SageMaker by following the next steps section below.

Was this page helpful?

Feedback

Next steps

Train a deep learning model

Learn how to build, train, and tune a TensorFlow deep learning model.

Create an ML model automatically

Learn how to use AutoML to develop ML models without writing code.

Find more hands-on tutorials

Explore other machine learning tutorials to dive deeper.

Select your cookie preferences