Artificial Intelligence

Serverless deployment for your Amazon SageMaker Canvas models

Deploying machine learning (ML) models into production can often be a complex and resource-intensive task, especially for customers without deep ML and DevOps expertise. Amazon SageMaker Canvas simplifies model building by offering a no-code interface, so you can create highly accurate ML models using your existing data sources and without writing a single line of code. But building a model is only half the journey; deploying it efficiently and cost-effectively is just as crucial. Amazon SageMaker Serverless Inference is designed for workloads with variable traffic patterns and idle periods. It automatically provisions and scales infrastructure based on demand, alleviating the need to manage servers or pre-configure capacity.

In this post, we walk through how to take an ML model built in SageMaker Canvas and deploy it using SageMaker Serverless Inference. This solution can help you go from model creation to production-ready predictions quickly, efficiently, and without managing any infrastructure.

Solution overview

To demonstrate serverless endpoint creation for a SageMaker Canvas trained model, let’s explore an example workflow:

  1. Add the trained model to the Amazon SageMaker Model Registry.
  2. Create a new SageMaker model with the correct configuration.
  3. Create a serverless endpoint configuration.
  4. Deploy the serverless endpoint with the created model and endpoint configuration.

You can also automate the process, as illustrated in the following diagram.

Solution architecture

In this example, we deploy a pre-trained regression model to a serverless SageMaker endpoint. This way, we can use our model for variable workloads that don’t require real-time inference.

Prerequisites

As a prerequisite, you must have access to Amazon Simple Storage Service (Amazon S3) and Amazon SageMaker AI. If you don’t already have a SageMaker AI domain configured in your account, you also need permissions to create a SageMaker AI domain.

You must also have a regression or classification model that you have trained. You can train your SageMaker Canvas model as you normally would. This includes creating the Amazon SageMaker Data Wrangler flow, performing necessary data transformations, and choosing the model training configuration. If you don’t already have a trained model, you can follow one of the labs in the Amazon SageMaker Canvas Immersion Day to create one before continuing. For this example, we use a classification model that was trained on the canvas-sample-shipping-logs.csv sample dataset.

Save your model to the SageMaker Model Registry

Complete the following steps to save your model to the SageMaker Model Registry:

  1. On the SageMaker AI console, choose Studio to launch Amazon SageMaker Studio.
  2. In the SageMaker Studio interface, launch SageMaker Canvas, which will open in a new tab.

Open SageMaker Studio

  1. Locate the model and model version that you want to deploy to your serverless endpoint.
  2. On the options menu (three vertical dots), choose Add to Model Registry.

Save to model registry

You can now exit SageMaker Canvas by logging out. To manage costs and prevent additional workspace charges, you can also configure SageMaker Canvas to automatically shut down when idle.

Approve your model for deployment

After you have added your model to the Model Registry, complete the following steps:

  1. In the SageMaker Studio UI, choose Models in the navigation pane.

The model you just exported from SageMaker Canvas should be added with a deployment status of Pending manual approval.

  1. Choose the model version you want to deploy and update the status to Approved by choosing the deployment status.

Find deploy tab

  1. Choose the model version and navigate to the Deploy tab. This is where you will find the information related to the model and associated container.
  2. Select the container and model location related to the trained model. You can identify it by checking the presence of the environment variable SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT.

ECR and S3 URIs

Create a new model

Complete the following steps to create a new model:

  1. Without closing the SageMaker Studio tab, open a new tab and open the SageMaker AI console.
  2. Choose Models in the Inference section and choose Create model.
  3. Name your model.
  4. Leave the container input option as Provide model artifacts and inference image location and used the CompressedModel type.
  5. Enter the Amazon Elastic Container Registry (Amazon ECR) URI, Amazon S3 URI, and environment variables that you located in the previous step.

The environment variables will be shown as a single line in SageMaker Studio, with the following format:

SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT: text/csv, SAGEMAKER_INFERENCE_OUTPUT: predicted_label, SAGEMAKER_INFERENCE_SUPPORTED: predicted_label, SAGEMAKER_PROGRAM: tabular_serve.py, SAGEMAKER_SUBMIT_DIRECTORY: /opt/ml/model/code

You might have different variables than those in the preceding example. All variables from your environment variables should be added to your model. Make sure that each environment variable is on its own line when creating you new model.

Model Environment Variables

  1. Choose Create model.

Create an endpoint configuration

Complete the following steps to create an endpoint configuration:

  1. On the SageMaker AI console, choose Endpoint configurations to create a new model endpoint configuration.
  2. Set the type of endpoint to Serverless and set the model variant to the model created in the previous step.

Model endpoint configuration

  1. Choose Create endpoint configuration.

Create an endpoint

Complete the following steps to create an endpoint:

  1. On the SageMaker AI console, choose Endpoints in the navigation pane and create a new endpoint.
  2. Name the endpoint.
  3. Select the endpoint configuration created in the previous step and choose Select endpoint configuration.
  4. Choose Create endpoint.

Model endpoint creation

The endpoint might take a few minutes to be created. When the status is updated to InService, you can begin calling the endpoint.

The following sample code demonstrates how you can call an endpoint from a Jupyter notebook located in your SageMaker Studio environment:

import boto3
import csv
from io import StringIO
import time

def invoke_shipping_prediction(features):
    sagemaker_client = boto3.client('sagemaker-runtime')
    
    # Convert to CSV string format
    output = StringIO()
    csv.writer(output).writerow(features)
    payload = output.getvalue()
    
    response = sagemaker_client.invoke_endpoint(
        EndpointName='canvas-shipping-data-model-1-serverless-endpoint',
        ContentType='text/csv',
        Accept='text/csv',
        Body=payload
    )
    
    response_body = response['Body'].read().decode()
    reader = csv.reader(StringIO(response_body))
    result = list(reader)[0]  # Get first row
    
    # Parse the response into a more usable format
    prediction = {
        'predicted_label': result[0],
        'confidence': float(result[1]),
        'class_probabilities': eval(result[2]),  
        'possible_labels': eval(result[3])       
    }
    
    return prediction

# Features for inference
features_set_1 = [
    "Bell",
    "Base",
    14,
    6,
    11,
    11,
    "GlobalFreight",
    "Bulk Order",
    "Atlanta",
    "2020-09-11 00:00:00",
    "Express",
    109.25199890136719
]

features_set_2 = [
    "Bell",
    "Base",
    14,
    6,
    15,
    15,
    "MicroCarrier",
    "Single Order",
    "Seattle",
    "2021-06-22 00:00:00",
    "Standard",
    155.0483856201172
]

# Invoke the SageMaker endpoint for feature set 1
start_time = time.time()
result = invoke_shipping_prediction(features_set_1)

# Print Output and Timing
end_time = time.time()
total_time = end_time - start_time

print(f"Total response time with endpoint cold start: {total_time:.3f} seconds")
print(f"Prediction for feature set 1: {result['predicted_label']}")
print(f"Confidence for feature set 1: {result['confidence']*100:.2f}%")
print("\nProbabilities for feature set 1:")
for label, prob in zip(result['possible_labels'], result['class_probabilities']):
    print(f"{label}: {prob*100:.2f}%")


print("---------------------------------------------------------")

# Invoke the SageMaker endpoint for feature set 2
start_time = time.time()
result = invoke_shipping_prediction(features_set_2)

# Print Output and Timing
end_time = time.time()
total_time = end_time - start_time

print(f"Total response time with warm endpoint: {total_time:.3f} seconds")
print(f"Prediction for feature set 2: {result['predicted_label']}")
print(f"Confidence for feature set 2: {result['confidence']*100:.2f}%")
print("\nProbabilities for feature set 2:")
for label, prob in zip(result['possible_labels'], result['class_probabilities']):
    print(f"{label}: {prob*100:.2f}%")

Automate the process

To automatically create serverless endpoints each time a new model is approved, you can use the following YAML file with AWS CloudFormation. This file will automate the creation of SageMaker endpoints with the configuration you specify.

This sample CloudFormation template is provided solely for inspirational purposes and is not intended for direct production use. Developers should thoroughly test this template according to their organization’s security guidelines before deployment.

AWSTemplateFormatVersion: "2010-09-09"
Description: Template for creating Lambda function to handle SageMaker model
  package state changes and create serverless endpoints

Parameters:
  MemorySizeInMB:
    Type: Number
    Default: 1024
    Description: Memory size in MB for the serverless endpoint (between 1024 and 6144)
    MinValue: 1024
    MaxValue: 6144

  MaxConcurrency:
    Type: Number
    Default: 20
    Description: Maximum number of concurrent invocations for the serverless endpoint
    MinValue: 1
    MaxValue: 200

  AllowedRegion:
    Type: String
    Default: "us-east-1"
    Description: AWS region where SageMaker resources can be created

  AllowedDomainId:
    Type: String
    Description: SageMaker Studio domain ID that can trigger deployments
    NoEcho: true

  AllowedDomainIdParameterName:
    Type: String
    Default: "/sagemaker/serverless-deployment/allowed-domain-id"
    Description: SSM Parameter name containing the SageMaker Studio domain ID that can trigger deployments

Resources:
  AllowedDomainIdParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Ref AllowedDomainIdParameterName
      Type: String
      Value: !Ref AllowedDomainId
      Description: SageMaker Studio domain ID that can trigger deployments

  SageMakerAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: Managed policy for SageMaker serverless endpoint creation
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action:
              - sagemaker:CreateModel
              - sagemaker:CreateEndpointConfig
              - sagemaker:CreateEndpoint
              - sagemaker:DescribeModel
              - sagemaker:DescribeEndpointConfig
              - sagemaker:DescribeEndpoint
              - sagemaker:DeleteModel
              - sagemaker:DeleteEndpointConfig
              - sagemaker:DeleteEndpoint
            Resource: !Sub "arn:aws:sagemaker:${AllowedRegion}:${AWS::AccountId}:*"
          - Effect: Allow
            Action:
              - sagemaker:DescribeModelPackage
            Resource: !Sub "arn:aws:sagemaker:${AllowedRegion}:${AWS::AccountId}:model-package/*/*"
          - Effect: Allow
            Action:
              - iam:PassRole
            Resource: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AmazonSageMaker-ExecutionRole-*"
            Condition:
              StringEquals:
                "iam:PassedToService": "sagemaker.amazonaws.com"
          - Effect: Allow
            Action:
              - ssm:GetParameter
            Resource: !Sub "arn:aws:ssm:${AllowedRegion}:${AWS::AccountId}:parameter${AllowedDomainIdParameterName}"

  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - !Ref SageMakerAccessPolicy

  ModelDeploymentFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ZipFile: |
          import os
          import json
          import boto3

          sagemaker_client = boto3.client('sagemaker')
          ssm_client = boto3.client('ssm')

          def handler(event, context):
              print(f"Received event: {json.dumps(event, indent=2)}")
              try:
                  # Get details directly from the event
                  detail = event['detail']
                  print(f'detail: {detail}')
                  
                  # Get allowed domain ID from SSM Parameter Store
                  parameter_name = os.environ.get('ALLOWED_DOMAIN_ID_PARAMETER_NAME')
                  try:
                      response = ssm_client.get_parameter(Name=parameter_name)
                      allowed_domain = response['Parameter']['Value']
                  except Exception as e:
                      print(f"Error retrieving parameter {parameter_name}: {str(e)}")
                      allowed_domain = '*'  # Default fallback
                  
                  # Check if domain ID is allowed
                  if allowed_domain != '*':
                      created_by_domain = detail.get('CreatedBy', {}).get('DomainId')
                      if created_by_domain != allowed_domain:
                          print(f"Domain {created_by_domain} not allowed. Allowed: {allowed_domain}")
                          return {'statusCode': 403, 'body': 'Domain not authorized'}

                  # Get the model package ARN from the event resources
                  model_package_arn = event['resources'][0]

                  # Get the model package details from SageMaker
                  model_package_response = sagemaker_client.describe_model_package(
                      ModelPackageName=model_package_arn
                  )

                  # Parse model name and version from ModelPackageName
                  model_name, version = detail['ModelPackageName'].split('/')
                  serverless_model_name = f"{model_name}-{version}-serverless"

                  # Get all container details directly from the event
                  container_defs = detail['InferenceSpecification']['Containers']

                  # Get the execution role from the event and convert to proper IAM role ARN format
                  assumed_role_arn = detail['CreatedBy']['IamIdentity']['Arn']
                  execution_role_arn = assumed_role_arn.replace(':sts:', ':iam:')\
                                                   .replace('assumed-role', 'role/service-role')\
                                                   .rsplit('/', 1)[0]

                  # Prepare containers configuration for the model
                  containers = []
                  for i, container_def in enumerate(container_defs):
                      # Get environment variables from the model package for this container
                      environment_vars = model_package_response['InferenceSpecification']['Containers'][i].get('Environment', {}) or {}
                      
                      containers.append({
                          'Image': container_def['Image'],
                          'ModelDataUrl': container_def['ModelDataUrl'],
                          'Environment': environment_vars
                      })

                  # Create model with all containers
                  if len(containers) == 1:
                      # Use PrimaryContainer if there's only one container
                      create_model_response = sagemaker_client.create_model(
                          ModelName=serverless_model_name,
                          PrimaryContainer=containers[0],
                          ExecutionRoleArn=execution_role_arn
                      )
                  else:
                      # Use Containers parameter for multiple containers
                      create_model_response = sagemaker_client.create_model(
                          ModelName=serverless_model_name,
                          Containers=containers,
                          ExecutionRoleArn=execution_role_arn
                      )

                  # Create endpoint config
                  endpoint_config_name = f"{serverless_model_name}-config"
                  create_endpoint_config_response = sagemaker_client.create_endpoint_config(
                      EndpointConfigName=endpoint_config_name,
                      ProductionVariants=[{
                          'VariantName': 'AllTraffic',
                          'ModelName': serverless_model_name,
                          'ServerlessConfig': {
                              'MemorySizeInMB': int(os.environ.get('MEMORY_SIZE_IN_MB')),
                              'MaxConcurrency': int(os.environ.get('MAX_CONCURRENT_INVOCATIONS'))
                          }
                      }]
                  )

                  # Create endpoint
                  endpoint_name = f"{serverless_model_name}-endpoint"
                  create_endpoint_response = sagemaker_client.create_endpoint(
                      EndpointName=endpoint_name,
                      EndpointConfigName=endpoint_config_name
                  )

                  return {
                      'statusCode': 200,
                      'body': json.dumps({
                          'message': 'Serverless endpoint deployment initiated',
                          'endpointName': endpoint_name
                      })
                  }

              except Exception as e:
                  print(f"Error: {str(e)}")
                  raise
      Runtime: python3.12
      Timeout: 300
      MemorySize: 128
      Environment:
        Variables:
          MEMORY_SIZE_IN_MB: !Ref MemorySizeInMB
          MAX_CONCURRENT_INVOCATIONS: !Ref MaxConcurrency
          ALLOWED_DOMAIN_ID_PARAMETER_NAME: !Ref AllowedDomainIdParameterName

  EventRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Rule to trigger Lambda when SageMaker Model Package state changes
      EventPattern:
        source:
          - aws.sagemaker
        detail-type:
          - SageMaker Model Package State Change
        detail:
          ModelApprovalStatus:
            - Approved
          UpdatedModelPackageFields:
            - ModelApprovalStatus
      State: ENABLED
      Targets:
        - Arn: !GetAtt ModelDeploymentFunction.Arn
          Id: ModelDeploymentFunction

  LambdaInvokePermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref ModelDeploymentFunction
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt EventRule.Arn

Outputs:
  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt ModelDeploymentFunction.Arn
  EventRuleArn:
    Description: ARN of the EventBridge rule
    Value: !GetAtt EventRule.Arn

This stack will limit automated serverless endpoint creation to a specific AWS Region and domain. You can find your domain ID when accessing SageMaker Studio from the SageMaker AI console, or by running the following command: aws sagemaker list-domains —region [your-region]

Clean up

To manage costs and prevent additional workspace charges, make sure that you have logged out of SageMaker Canvas. If you tested your endpoint using a Jupyter notebook, you can shut down your JupyterLab instance by choosing Stop or configuring automated shutdown for JupyterLab.

Stop Jupyter Lab Space

In this post, we showed how to deploy a SageMaker Canvas model to a serverless endpoint using SageMaker Serverless Inference. By using this serverless approach, you can quickly and efficiently serve predictions from your SageMaker Canvas models without needing to manage the underlying infrastructure.

This seamless deployment experience is just one example of how AWS services like SageMaker Canvas and SageMaker Serverless Inference simplify the ML journey, helping businesses of different sizes and technical proficiencies unlock the value of AI and ML. As you continue exploring the SageMaker ecosystem, be sure to check out how you can unlock data governance for no-code ML with Amazon DataZone, and seamlessly transition between no-code and code-first model development using SageMaker Canvas and SageMaker Studio.


About the authors

Nadhya Polanco is a Solutions Architect at AWS based in Brussels, Belgium. In this role, she supports organizations looking to incorporate AI and Machine Learning into their workloads. In her free time, Nadhya enjoys indulging in her passion for coffee and traveling.

Brajendra Singh is a Principal Solutions Architect at Amazon Web Services, where he partners with enterprise customers to design and implement innovative solutions. With a strong background in software development, he brings deep expertise in Data Analytics, Machine Learning, and Generative AI.