Deploy a Multi-Model Endpoint to a Real-Time Inference
TUTORIAL
Overview
In this tutorial, you learn how to deploy a multi-model endpoint with multiple trained machine learning models to a single real-time inference using an Amazon SageMaker notebook instance.
SageMaker Studio is an integrated development environment (IDE) for machine learning (ML) that provides a fully managed Jupyter notebook interface in which you can perform end-to-end ML lifecycle tasks, including model deployment.
SageMaker offers different inference options to support a broad range of use cases:
- SageMaker Real-Time Inference for workloads with low latency requirements on the order of milliseconds.
- SageMaker Serverless Inference for workloads with intermittent or infrequent traffic patterns.
- SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times.
- SageMaker Batch Transform to run predictions on batches of data.
In this tutorial, you will use the SageMaker Real-Time Inference option to deploy a set of binary classification XGBoost models that has already been trained on a synthetic house price prediction dataset. The dataset consists of details on house prices based on features such as number of bedrooms, square feet, and number of bathrooms. These models each predicts housing prices for a single location. You will play the role of a machine learning engineer to deploy these models and run sample inferences.
What you will accomplish
In this guide, you will:
- Create multiple SageMaker models from respective trained model artifacts
- Configure and deploy a real-time endpoint to serve these models
- Invoke the multi-model endpoint to run sample predictions using test data
Prerequisites
Before starting this guide, you will need:
- An AWS account: If you don’t already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
AWS experience
Intermediate
Time to complete
15 minutes
Cost to complete
See SageMaker pricing to estimate cost for this tutorial
Requires
Services used
Amazon SageMaker Real-time Inference
Amazon SageMaker Notebook
Last updated
March 7, 2022
Implementation
Step 1: Set up Amazon SageMaker Studio domain
If you don't have an existing SageMaker Studio domain, continue with Step 1 to run an AWS CloudFormation template that creates a SageMaker Studio domain and adds the permissions required for the rest of this tutorial.
Step 2: Set up a SageMaker Studio notebook
2.1 – Enter SageMaker Studio into the console search bar, and then choose SageMaker Studio.
2.4 – In the Set up notebook environment dialog box, under Image, select Data Science. The Python 3 kernel is selected automatically. Choose Select.
2.5 – The kernel on the top right corner of the notebook should now display Python 3 (Data Science).
%pip install --upgrade -q aiobotocore
import boto3
import sagemaker
import time
from sagemaker.image_uris import retrieve
from time import gmtime, strftime
from sagemaker.amazon.amazon_estimator import image_uris
sagemaker_session = sagemaker.Session()
default_bucket = sagemaker_session.default_bucket()
write_prefix = "housing-prices-prediction-mme-demo"
region = sagemaker_session.boto_region_name
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")
role = sagemaker.get_execution_role()
# S3 locations used for parameterizing the notebook run
read_bucket = "sagemaker-sample-files"
read_prefix = "models/house_price_prediction"
model_prefix = "models/xgb-hpp"
# S3 location of trained model artifact
model_artifacts = f"s3://{default_bucket}/{model_prefix}/"
# Location
location = ['Chicago_IL', 'Houston_TX', 'NewYork_NY', 'LosAngeles_CA']
test_data = [1997, 2527, 6, 2.5, 0.57, 1]
for i in range (0,4):
copy_source = {'Bucket': read_bucket, 'Key': f"{read_prefix}/{location[i]}.tar.gz"}
bucket = s3.Bucket(default_bucket)
bucket.copy(copy_source, f"{model_prefix}/{location[i]}.tar.gz")
Step 3: Create a Real-Time Inference endpoint
In SageMaker, you can deploy a trained model to a Real-Time Inference endpoint using either the: SageMaker SDK, AWS SDK - Boto3, or the SageMaker console. For more information, see Deploy Models for Inference in the Amazon SageMaker Developer Guide. SageMaker SDK has more abstractions compared to the AWS SDK - Boto3 with the latter exposing lower-level APIs for greater control over model deployment. In this tutorial, you deploy the model using the AWS SDK -Boto3. To deploy a model, you need to follow these three steps in:
- Create a SageMaker model from the model artifact
- Create an endpoint configuration to specify properties, including instance type and count
- Create the endpoint using the endpoint configuration
# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")
# Specify an unique model name that does not exist
model_name = "housing-prices-prediction-mme-xgb"
primary_container = {
"Image": training_image,
"ModelDataUrl": model_artifacts,
"Mode": "MultiModel"
}
model_matches = sm_client.list_models(NameContains=model_name)["Models"]
if not model_matches:
model = sm_client.create_model(ModelName=model_name,
PrimaryContainer=primary_container,
ExecutionRoleArn=role)
else:
print(f"Model with name {model_name} already exists! Change model name to create new")
# Endpoint Config name
endpoint_config_name = f"{model_name}-endpoint-config"
# Create endpoint if one with the same name does not exist
endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)["EndpointConfigs"]
if not endpoint_config_matches:
endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"InstanceType": "ml.m5.xlarge",
"InitialInstanceCount": 1,
"InitialVariantWeight": 1,
"ModelName": model_name,
"VariantName": "AllTraffic",
}
],
)
else:
print(f"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new")
3.4 - You can check the created endpoint configuration in the SageMaker console under the Endpoint configurations section.
# Endpoint name
endpoint_name = f"{model_name}-endpoint"
endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)["Endpoints"]
if not endpoint_matches:
endpoint_response = sm_client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name
)
else:
print(f"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new")
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
print(f"Endpoint Status: {status}...")
time.sleep(60)
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")
Step 4: Invoke the Inference endpoint
# converting the elements in test data to string
payload = ' '.join([str(elem) for elem in test_data])
for i in range (0,4):
predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"{location[i]}.tar.gz", ContentType="text/csv", Body=payload)
print(f"Predicted Value for {location[i]} target model:\n ${predicted_value['Body'].read().decode('utf-8')}")
Step 5: Clean up resources
# Delete model
sm_client.delete_model(ModelName=model_name)
# Delete endpoint configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
# Delete endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)
5.2 – To delete the S3 bucket:
- Open the Amazon S3 console. On the navigation bar, choose Buckets, sagemaker-<your-Region>-<your-account-id>, and then select models/xgb-hpp the checkbox next to housing-prices-prediction-demo. Then, choose Delete.
- On the Delete objects dialog box, verify that you have selected the proper object to delete and enter permanently delete into the Permanently delete objects confirmation box.
- Once this is complete and the bucket is empty, you can delete the sagemaker-<your-Region>-<your-account-id> bucket by following the same procedure again.
To delete the SageMaker Studio apps, do the following: On the SageMaker Studio console, choose studio-user, and then delete all the apps listed under Apps by choosing Delete app. Wait until the Status changes to Deleted.
If you used an existing SageMaker Studio domain in Step 1, skip the rest of Step 5 and proceed directly to the conclusion section.
If you ran the CloudFormation template in Step 1 to create a new SageMaker Studio domain, continue with the following steps to delete the domain, user, and the resources created by the CloudFormation template.