AWS Partner Network (APN) Blog
How to Access the Jurassic-2 Large Language Model via an AWS Lambda Endpoint
By Sheldon Sides, Manager of Solutions Architecture – AWS
By Yuval Belfer, Developer Advocate – AI21 Labs
AI21 Labs |
The ever-evolving landscape of artificial intelligence (AI) has given rise to transformative solutions. At the forefront of this revolution is generative AI and large language models (LLMs). The proliferation of these technologies presents an unparalleled opportunity for businesses seeking to leverage their vast capabilities.
Generative AI and LLMs such as AI21 Labs’ Jurassic models are reshaping how we engage with information. They help solve for information overload and time constraints with capabilities to comprehend and produce human-like text. As a result, they play an increasingly significant role in enhancing customer experience and boosting operational efficiency.
This post walks through using the Jurassic-2 (J2) large language model and how to consume the model via an AWS Lambda API endpoint. We’ll also discuss how to make calls to the J2 model via the publicly available Jurassic-2 Python SDK. Additionally, we’ll make calls to the SDK from within a Lambda endpoint and pass back the results of the LLM as a JSON result.
As part of this post, code will be included to deploy the solution in your own Amazon Web Services (AWS) account as a proof of concept (PoC) that you can expand further on. Before jumping into reviewing the code, we’ll first review the high-level architecture of the solution.
AI21 Labs is an AWS Partner and AWS Marketplace Seller that’s a leader in generative AI and LLMs, revolutionizing the way people read and write. Founded in 2017, it was among the first companies to bring generative AI to the masses and, to date, offers enterprise solutions and consumer applications.
AI21 Studio is AI21 Labs’ developer platform that provides API access to state-of-the-art LLMs, powering natural language comprehension and generation features in thousands of live applications and services. Businesses use AI21 Studio’s suite of solutions to build generative AI-driven solutions and drive innovation, unlock new capabilities, and scale existing initiatives.
Knowledge Prerequisites
Our goal in this post is to show what’s possible when it comes to building generative AI solutions on AWS. There are many ways to deploy generative AI solutions on AWS, ranging from deploying an AWS foundation models using Amazon SageMaker JumpStart, to building and training your own models on AWS custom silicon chips or using a partner foundation model.
There’s a prerequisite set of skills we assume you have a grasp of to fully benefit from this post. It’s assumed you understand the following topics: AWS Lambda, Python, APIs, generative AI basics and terminology, and using generative AI APIs/SDKs.
Jurassic-2 Foundation Models
Let’s start by getting an understanding of what the Jurassic-2 series is; J2 is a series of large language models created by AI21 Labs that offers multiple model sizes, and it includes foundation models in three different sizes: Ultra, Mid, and Light. The J2 models also support multiple languages for you out of the box.
AI21 Labs has also created tasks-specific models such as Contextual-Answers, Paraphrase, Summarize, and Grammatical Error Corrections APIs which are tailor-made to carry out specific generative AI tasks. These tasks-specific APIs were created because they encapsulate frequently performed actions of many model consumers. Moreover, they do not require prompt engineering and can be rolled out into production in a matter of minutes.
For the remainder of this post, we’ll focus on using the Jurassic-2 Ultra model.
Architecture Overview
In this section, we’ll briefly review each part of the architecture that allows you to call the Jurassic-2 large language model.
Figure 1 – Using the Jurassic-2 model via AWS Lambda endpoint.
In section one [1] of the diagram above, the user is communicating with the frontend application that will send requests to the Jurassic-2 model via the Lambda endpoint. Once the application receives the user’s query [2] the request is sent to the Lambda endpoint [3] so it can be processed.
Once the Lambda endpoint receives the incoming request, the Lambda function handles the parsing of the incoming request object to retrieve the user’s query [4].
After the function has parsed the user’s question from the incoming request object [5], a prompt is generated that will be passed to the J2 model. The request to the J2 model is made up of the prompt (user’s query) and a combination of parameters that the model expects when calling it.
Once the model parameters have been set, they are passed to the Jurassic-2 Complete API via the Python SDK [6] which handles the execution of the call to the model.
One thing to note is that the Lambda function does not actually contain the model. The Jurassic-2 Python SDK handles abstracting the call to the underlying J2 public API endpoint that hosts the model for you.
After the model receives the request, it will be processed and returns a JSON result object back as a response to the application layer [7]. The application layer is then responsible for parsing and displaying the results of the model to the user.
Now that we have covered the architectural flow of the solution, let’s dive deeper into the code and different parameters the model expects.
Deploying Code
To deploy the solution in a demo environment, you can get the code from the AWS Samples GitHub repo. The repo contains a detailed deployment guide along with an AWS CloudFormation template for you to easily deploy and test the Lambda endpoint in a demo environment.
Reviewing AWS Lambda Endpoint Code
In this section, we’ll review the Lambda code that makes the call to the Jurassic-2 Ultra model. We’ll first review the model parameters that are passed to the model when calling it in the next section.
Figure 2 – AWS Lambda code.
All Python code shown above can be found on the AWS Samples GitHub as part of an AWS CloudFormation template.
Model Parameters
Before we dive into explaining the model parameters. One thing to note in the code in Figure 2 is the model parameters are retrieved from the Lambda environment variables settings. This avoids hard-coding any model parameter values in the code.
The HTTP request body is sent to the Lambda endpoint or AWS Secrets Manager for sensitive data such as API keys.
Below, we will outline what each parameter is and its purpose.
Prompt
The prompt parameter on line 7 is parsed from the body of the incoming request and is contained in the event body. The prompt is the instruction you give to the model.
Model
The model parameter on line 23 allows you to set the model you’d like to call. In this example, we are calling the Jurassic-2 Ultra model. This is the most advanced and capable Jurassic-2 model currently available.
If you’d like to learn more about the different Jurassic-2 models, see the Jurassic-2 documentation.
numResults
The numResults parameter on line 26 is the number of results you’d like the model to return in its results. A value greater than 1 is meaningful only in case of non-greedy decoding; that is, temperature > 0.
In the example, we’re only requesting the model to return one result. If you would like the model to return multiple results, you can increase this number in the environment variables of the Lambda function.
maxTokens
The maxTokens parameter on line 29 is the maximum number of tokens you’d like the model to consume when generating results. Think of tokens as the number of characters/words a model will generate when processing your request. You can think of tokens as the number of words a model generates when processing your request. Note this is merely a threshold that will cut off longer generations, and does not necessarily encourage longer generations.
In this example, we have set the environment variable to use a maximum of 500 tokens.
To learn more about tokens, see the Tokenizer and Tokenization documentation.
Temperature
The last parameter is the temperature parameter on line 32. The temperature parameter can have a value between 0.0 and 1.0.
Think of the temperature parameter as controlling the level of randomness of the results generated by the model. The higher the temperature, the more unique, varied, and creative the results returned from the model will be; the opposite is true for a lower temperature. You can find more information about the temperature parameter in the AI21 documentation.
In the example code, the model_temp environment variable is set to 0.7. For many LLMs, the value of 0.7 seems to be a good middle ground to start with for your temperature value, but feel free to experiment with this parameter based on your desired output.
For a complete list of all possible model parameters, see the API Parameters documentation.
Calling the Model
Now that we have a high-level understanding of the model parameters, let’s review the code that executes the call to the model.
Starting on line 35 in the code snippet in Figure 2 is where we pass all of the model parameters to the Complete API, which is part of the Jurassic-2 SDK, to execute calls to the model. Once the method executes successfully, the returned value is stored in a variable called model_response.
The J2 model will return a JSON object that includes your response to the query you sent to the model, along with other relevant metadata. You can see a complete view of the JSON schema that’s returned from the model in the Jurassic-2 API documentation.
Extracting Text Response from the Model
On lines 44 and 49, we are parsing the text result which is part of the JSON object returned by Jurassic-2 model.
The code snippet below parses the completion text value from the results that are returned from the model:
model_response.completions[0].data.text
If you recall, in the Model Parameter section there’s a parameter called numResults which determines how many results the model will return. In this example, we set the numResults value to 1. This is why we parse the zero index of the completion object that’s returned from the model.
If you set the numResults value to something greater than one, you’d need to iterate over the model_response.completions array to get all the values that were returned by the J2 model.
Returning Results from the Lambda Endpoint
The last thing we need to do is have the Lambda endpoint return the results to the caller. For simplicity in this code, we return a JSON object that contains the model results as part of the JSON body. Once the results are returned to Lambda endpoint caller, the data can be displayed as it would in any other web or mobile application.
In the next section, we’ll briefly walk through how to test your Lambda endpoint results using AWS CloudShell with the curl command.
Testing Lambda Endpoint
Here, we’ll walk through testing the Lambda endpoint which calls the Jurassic-2 Ultra model. In this example, we’ll use AWS CloudShell to make a call to the endpoint using curl. If you would like to follow along and test your endpoint, be sure you have followed the instructions in the deployment guide.
To start testing out the J2 model results, open AWS CloudShell and copy this command:
curl -X POST -H "Content-Type: application/json" -d '{"prompt" : "Tell me a short story about a tiger and lion."}' https://<your-lambda-endpoint-url>.lambda-url.us-east-1.on.aws
You can find your Lambda endpoint URL by following the instructions in the deployment guide. You can see what the command looks like in the CloudShell image below.
Figure 3 – Calling endpoint from AWS CloudShell.
After you have ran the command in CloudShell, you get the result outlined in red in Figure 4. Note your result will be different than what’s displayed below.
Each time you run the curl command, you should get a different result from the model since we set our model temperature parameter to 0.7. You can experiment with different model parameters to see what different results will be returned.
Figure 4 – Results returned from the Jurassic-2 model.
Summary
We hope you found this solution useful and that it helps spark some ideas about how to integrate generative AI into your next solution. Below is a list of resources you can use to continue to expand your knowledge about running generative AI solutions on AWS:
- Generative AI on AWS
- AWS Partners: Reinventing Your Customers’ Business with Generative AI
- Generative AI Foundations on AWS Technical Deep Dive Series
- Generative AI Hands-on Course by DeepLearning.AI and AWS
- Amazon SageMaker JumpStart
- Amazon Bedrock
- Amazon Titan
- AWS Inferentia
- AWS Trainium
- Amazon CodeWhisperer
- HuggingFace on AWS
You can also learn more about AI21 Labs and the Jurassic-2 models in AWS Marketplace.
AI21 Labs – AWS Partner Spotlight
AI21 Labs is an AWS Partner and leader in generative AI and large language models, revolutionizing the way people read and write. Founded in 2017, it was among the first companies to bring generative AI to the masses and, to date, offers enterprise solutions and consumer applications.