Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda

March 2025: This post was reviewed and updated for accuracy.

At AWS Machine Learning (ML) workshops, customers often ask, “After I deploy an endpoint, where do I go from there?” You can deploy an Amazon SageMaker AI trained and validated ML model as an online endpoint in production. Alternatively, you can choose which SageMaker functionality to use. For example, you can choose just to train a model or to host one. Whether you choose one SageMaker functionality or use them all, you invoke the model as an endpoint deployed somewhere.

The following diagram shows how the deployed model is called using serverless architecture. Starting from the client side, a client script calls an Amazon API Gateway API action and passes parameter values. API Gateway is a layer that provides the API to the client. In addition, it seals the backend so that AWS Lambda stays and runs in a protected private network. API Gateway passes the parameter values to the Lambda function. The Lambda function parses the value and sends it to the SageMaker model endpoint. The model returns the response to Lambda. The Lambda function sends it back to API Gateway. API Gateway responds to the client with the generated response

In this post, we show you how to invoke a model endpoint deployed by SageMaker using API Gateway and Lambda. For testing, we will be using cURL, a command-line tool.

Foundation Model – Meta Llama 2 8B

Amazon SageMaker JumpStart is a powerful feature of SageMaker that offers pre-built machine learning models and comprehensive end-to-end solutions, enabling users to rapidly start and scale their machine learning projects. It simplifies tasks like model deployment, customization, and training for a variety of common use cases, reducing the effort required to achieve results.

JumpStart supports deploying Foundation Models, Computer Vision Models, and Natural Language Processing Models. For this demonstration, we will utilize the Meta Llama 2 8B Foundation Model, a state-of-the-art model designed for natural language understanding and generation tasks. Foundation models are pre-trained on extensive datasets and can be fine-tuned to specific applications, making them highly versatile and efficient for a wide range of use cases. We will be leveraging Llama for asking general questions.

Once the model is deployed in SageMaker, we can interact with it by invoking the model endpoint using the SageMaker runtime API. To eliminate the need for managing servers and infrastructure, this interaction is streamlined using API Gateway and AWS Lambda, encapsulating the endpoint invocation and ensuring scalability and ease of use. We will use Meta Llama 2 model and invoke it with general prompts and display the response back.

Create a SageMaker JumpStart model endpoint

To create your model endpoint, complete the following steps:

Open the Amazon SageMaker AI console.
From the left-hand menu, navigate to Studio under Applications and IDEs.
Click Create a SageMaker domain.
Select Set up for single user (Quick Setup), then click Set up. Wait a few minutes for the SageMaker domain to be configured.
Return to the left-hand menu, go to Foundation Models under JumpStart, and search for Meta Llama 2 7B Chat.
Click View model, then select Open model in studio followed by Open studio.
In SageMaker Studio, click Deploy. On the Deploy model to endpoint page, leave the default settings and click Deploy.

Upon creation, we will copy the endpoint name. The default endpoint name looks like jumpstart-dft-meta-textgeneration-l-20XXXXXX-1XXXX.

Create a Lambda function that calls the SageMaker runtime invoke_endpoint

Now we have a SageMaker model endpoint. Let’s look at how we call it from Lambda. We use the SageMaker runtime API action and the Boto3 sagemaker-runtime.invoke_endpoint().

On the Lambda console, on the Functions page, choose Create function.
For Function name, enter a name.
For Runtime¸ choose your runtime.
For Execution role¸ keep Create a new role or Use an existing role.
Click Create Function.

If you chose Create a new role, after the Lambda function is created, go to the Configuration tab and find the name of the IAM role created. Click on the role name which will take you to IAM console.

Whether you created a new role or using the existing role, make sure to include the following policy, which gives your function permission to invoke a model endpoint:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sagemaker:InvokeEndpoint",
            "Resource": "*"
        }
    ]
}

Increase the timeout to 30 seconds.

The following is the sample Lambda function code:

import os
import boto3
import json

runtime = boto3.client('runtime.sagemaker')
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']

def lambda_handler(event, context):
    headers = {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Headers': 'Content-Type',
        'Access-Control-Allow-Methods': 'OPTIONS,POST'
    }

    try:
        data = json.loads(event['body'])
        prompt = data.get('prompt')
        
        if not prompt:
            return {
                'statusCode': 400,
                'headers': headers,
                'body': json.dumps({'error': 'Prompt is required'})
            }

        payload = {
            "inputs": [prompt],
            "parameters": {
                "max_new_tokens": 256,
                "top_p": 0.9,
                "temperature": 0.6
            }
        }
        
        print(f"Payload: {payload}")
        
        response = runtime.invoke_endpoint(
            EndpointName=ENDPOINT_NAME,
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        
        response_body = json.loads(response['Body'].read().decode())
        print(f"Response: {response_body}")

        if 'generated_text' in response_body:
            result = response_body['generated_text']
        else:
            result = response_body[0]['generated_text']

        return {
            'statusCode': 200,
            'headers': headers,
            'body': json.dumps({
                'result': result,
                'prompt': prompt
            })
        }

    except Exception as e:
        print(f"Error: {str(e)}")
        return {
            'statusCode': 500,
            'headers': headers,
            'body': json.dumps({'error': str(e)})
        }

ENDPOINT_NAME is an environment variable that holds the name of the SageMaker model endpoint you just deployed.

Enter the name as the environment variable value.

Now we have a SageMaker model endpoint and Lambda Function. We will create a REST API gateway to invoke our Lambda function.

Create a REST API: Integration request setup

You can create an API by following these steps:

In the Lambda Function page, choose Add trigger.
In Trigger configuration, choose API Gateway and select Create a new API.
Under API type select REST API, under Security select Open and choose Add. We are keeping it open for the sake of simplicity. However, it is recommended to add a security mechanism.
Copy the API endpoint URL from the Configuration section which we will use for testing.

Test with cURL

Now that we have the Lambda function, REST API, and model endpoint, let’s test it using cURL which is a command-line tool used to send and receive data over various network.

The following is the sample cURL command that we can use to get the output from deployed model:

curl -X POST -H "Content-Type: application/json" -d '{ "prompt": "Tell me about AWS cloud"}' 'https://vXXXXXX.execute-api.us-east-1.amazonaws.com/default/invokeSagemakerEndpoint'

Conclusion

In this post, we demonstrated how to integrate a model deployed using Amazon SageMaker AI with AWS Lambda and Amazon API Gateway to create a seamless, serverless architecture for invoking a deployed machine learning model. By leveraging SageMaker JumpStart, we utilized the Meta Llama 2 Foundation Model. This setup eliminates the need to manage underlying infrastructure while ensuring scalability and flexibility. With the combination of SageMaker, Lambda, and API Gateway, we can efficiently deploy, invoke, and scale machine learning models, empowering your applications with advanced AI capabilities.

If you have feedback about this post, please leave it in the comments.

About the Authors

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.

Achintya Veer Singh is a Solutions Architect at AWS based in Bangalore. He works with AWS customers to address their business challenges by designing secure, performant, and scalable solutions leveraging the latest cloud technologies. He is passionate about technology and enjoys building and experimenting with AI/ML and Gen AI. Outside of work, he enjoys cooking, reading non-fiction books, and spending time with his family.

Rohit Raj is a Solution Architect at AWS, specializing in Serverless and a member of the Serverless Technical Field Community. He continually explores new trends and technologies. He is passionate about guiding customers build highly available, resilient, and scalable solutions on cloud. Outside of work, he enjoys travelling, music, and outdoor sports.

Audit History

Last reviewed and updated in March 2025 by Achintya Veer Singh and Rohit Raj | Solutions Architect

Select your cookie preferences

AWS Machine Learning Blog