AWS Machine Learning Blog

Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda

December 2022: This post was reviewed and updated for accuracy.

At AWS Machine Learning (ML) workshops, customers often ask, “After I deploy an endpoint, where do I go from there?” You can deploy an Amazon SageMaker trained and validated ML model as an online endpoint in production. Alternatively, you can choose which SageMaker functionality to use. For example, you can choose just to train a model or to host one. Whether you choose one SageMaker functionality or use them all, you invoke the model as an endpoint deployed somewhere.

The following diagram shows how the deployed model is called using serverless architecture. Starting from the client side, a client script calls an Amazon API Gateway API action and passes parameter values. API Gateway is a layer that provides the API to the client. In addition, it seals the backend so that AWS Lambda stays and runs in a protected private network. API Gateway passes the parameter values to the Lambda function. The Lambda function parses the value and sends it to the SageMaker model endpoint. The model performs the prediction and returns the predicted value to Lambda. The Lambda function parses the returned value and sends it back to API Gateway. API Gateway responds to the client with that value.

In this post, I show you how to invoke a model endpoint deployed by SageMaker using API Gateway and Lambda. For testing purposes, we use Postman.

Click here to open the AWS console and follow along.

Breast cancer model notebook

We use a sample notebook provided by SageMaker called Breast Cancer Prediction.ipynb. You have access to this notebook on the SageMaker Examples tab.

Choosing Use creates a folder and loads the notebook. The breast cancer prediction model predicts whether a breast mass is a malignant tumor or benign by looking at features computed from a digitized image of a fine needle aspirate of a breast mass. The data used to train the model consists of the diagnosis as well as the 10 real-valued features that are computed for each cell nucleus (radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension). The prediction returned by the model is either 0 or 1; 0 being benign and 1 being malignant tumor. The Lambda function converts this value to be either B for benign or M for malignant tumor.

SageMaker has managed built-in Jupyter notebooks that allow you to write code in Python or R to explore, analyze, and do some modeling with small set of data. The sample Jupyter notebooks get loaded onto a notebook instance when the notebook instance boots up. Each sample notebook consists of markdown-based comments that explain each step, from downloading training data, performing the training, to deploying a model endpoint. After the model is trained and deployed, you can invoke the model endpoint using the SageMaker runtime API. To make it free from server and infrastructure management, we encapsulate this invocation using API Gateway and Lambda.

Create a SageMaker model endpoint

To create your model endpoint, complete the following steps:

  1. Open the Breast Cancer Prediction.ipynb sample notebook.
  2. Comment out the last cell by inserting #, because it deletes the endpoint created in the previous cell.

  1. Run the entire notebook by choosing Run All on the Cell

Alternatively, you can run each cell one by one by pressing Shift + Enter. If you run each cell, you can learn what each step is doing. For the purpose of this post, I chose Run All to deploy the model as an endpoint after the model training is complete.

Upon creation, you can view this endpoint on the SageMaker console. The default endpoint name looks like linear-endpoint-201803211721, but you can make it more meaningful. I called mine linear-learner-breast-cancer-prediction-endpoint.

Create a Lambda function that calls the SageMaker runtime invoke_endpoint

Now we have a SageMaker model endpoint. Let’s look at how we call it from Lambda. We use the SageMaker runtime API action and the Boto3 sagemaker-runtime.invoke_endpoint().

  1. On the Lambda console, on the Functions page, choose Create function.
  2. For Function name, enter a name.
  3. For Runtime¸ choose your runtime.
  4. For Execution role¸ select Create a new role or Use an existing role.

If you chose Create a new role, after the Lambda function is created, go to the Configuration tab and find the name of the IAM role created. Click on the role name which will take you to IAM console.

  1. Whether you created a new role or using the existing role, make sure to include the following policy, which gives your function permission to invoke a model endpoint:
    "Version": "2012-10-17",
    "Statement": [
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sagemaker:InvokeEndpoint",
            "Resource": "*"

The following is the sample Lambda function code:

import os
import io
import boto3
import json
import csv

# grab environment variables
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    data = json.loads(json.dumps(event))
    payload = data['data']
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
    result = json.loads(response['Body'].read().decode())
    pred = int(result['predictions'][0]['score'])
    predicted_label = 'M' if pred == 1 else 'B'
    return predicted_label

ENDPOINT_NAME is an environment variable that holds the name of the SageMaker model endpoint you just deployed using the sample notebook. Go to the SageMaker console to find the end point name generated by SageMaker. Enter the name as the environment variable value. It would look like DEMO-linear-endpoint-xxxxxxxxx.

The event that invokes the Lambda function is triggered by API Gateway. API Gateway simply passes the test data through an event.

Create a REST API: Integration request setup

You can create an API by following these steps:

  1. On the API Gateway console, choose the REST API
  2. Choose Build.

  1. Select New API.
  2. For API name¸ enter a name (for example, BreastCancerPredition).
  3. Leave Endpoint Type as Regional.
  4. Choose Create API.

  1. On the Actions menu, choose Create resource.
  2. Enter a name for the resource (for example, predictbreastcancer).
  3. After the resource is created, on the Actions menu, choose Create Method to create a POST method.

  1. For Integration type, select Lambda Function.

  1. For Lambda function, enter the function you created.

When the setup is complete, you can deploy the API to a stage.

  1. On the Actions menu, choose Deploy API.
  2. Create a new stage called test.
  3. Choose Deploy.

This step gives you the invoke URL.

For more information on creating an API with API Gateway, see Creating a REST API in Amazon API Gateway. In addition, you can make the API more secure using various methods.

Now that you have an API and a Lambda function in place, let’s look at the test data.

Test data

When the sample notebook loads the dataset to the Amazon Simple Storage Service (Amazon S3) bucket that you specified, CSV files with data are loaded. The sample notebook separates the dataset into two files: training data and validation data. Each file is saved in its respective folder. The Amazon S3 path to the file looks like <your bucket name>/sagemaker/DEMO-breast-cancer-prediction/validation.

The following code is one row of the validation data from the file in the validation folder:


Test with Postman

Now that we have the Lambda function, REST API, and test data, let’s test it using Postman, which is an HTTP client for testing web services. Make sure to download the latest version.

When you deployed your API, it provided the invoke URL, which looks like


It follows the format


For more information about invoking an API in API Gateway, see Invoking a REST API in Amazon API Gateway.

  1. Enter the invoke URL into Postman.
  2. Choose POST as method.
  3. On the Body tab, enter the test data.
  4. Choose Send to see the returned result as B for the row of test data we looked at earlier.


To learn more about SageMaker Inference, please refer to the following resources:


In this post, you created a model endpoint deployed and hosted by SageMaker. Then you created serverless components (a REST API and Lambda function) that invoke the endpoint. Now you know how to call an ML model endpoint hosted by SageMaker using serverless technology.

If you have feedback about this post, please leave it in the comments. If you have questions about implementing the example used in this post, you can also open a thread on the Developer Tools forum.

About the Author

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.