Amazon Bedrock now provides access to Meta’s Llama 2 Chat 13B model

Update: November 29, 2023 — Today, we’re adding the Llama 2 70B model in Amazon Bedrock, in addition to the already available Llama 2 13B model. As its name implies, the Llama 2 70B model has been trained on larger datasets than the Llama 2 13B model. If you’re wondering when to use which model, consider using Llama 13B for smaller-scale tasks such as text classification, sentiment analysis, and language translation, and Llama 2 70B for large-scale tasks such as language modeling, text generation, and dialogue systems. According to Meta, Llama 2 70B’s training took 1,720,320 GPU-hours, the equivalent of 196.38 years. Start using the Llama 2 70B model in Amazon Bedrock today. We’re excited to see what you build with these models.

—

Today, we’re announcing the availability of Meta’s Llama 2 Chat 13B large language model (LLM) on Amazon Bedrock. With this launch, Amazon Bedrock becomes the first public cloud service to offer a fully managed API for Llama 2, Meta’s next-generation LLM. Now, organizations of all sizes can access Llama 2 Chat models on Amazon Bedrock without having to manage the underlying infrastructure. This is a step change in accessibility.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Stability AI, Amazon, and now Meta, along with a broad set of capabilities to build generative AI applications, simplifying the development while maintaining privacy and security. You can read more about Amazon Bedrock in Antje’s post here.

Llama 2 is a family of publicly available LLMs by Meta. The Llama 2 base model was pre-trained on 2 trillion tokens from online public data sources. According to Meta, the training of Llama 2 13B consumed 184,320 GPU/hour. That’s the equivalent of 21.04 years of a single GPU, not accounting for bissextile years.

Built on top of the base model, the Llama 2 Chat model is optimized for dialog use cases. It is fine-tuned with over 1 million human annotations (a technique known as reinforcement learning from human feedback or RLHF) and has undergone testing by Meta to identify performance gaps and mitigate potentially problematic responses in chat use cases, such as offensive or inappropriate responses.

To promote a responsible, collaborative AI innovation ecosystem, Meta established a range of resources for all who use Llama 2: individuals, creators, developers, researchers, academics, and businesses of any size. In particular, I like the Meta Responsible Use Guide, a resource for developers that provides best practices and considerations for building products powered by LLMs in a responsible manner, covering various stages of development from inception to deployment. This guide fits well in the set of AWS tools and resources to build AI responsibly.

You can now integrate the LLama 2 Chat model in your applications written in any programming language by calling the Amazon Bedrock API or using the AWS SDKs or the AWS Command Line Interface (AWS CLI).

Llama 2 Chat in action
Those of you who read the AWS News blog regularly know we like to show you the technologies we write about. So let’s write code to interact with Llama2.

I was lucky enough to talk at the AWS UG Perú Conf a few weeks ago. Jeff and Marcia were there too. Jeff opened the conference with an inspiring talk about generative AI, and he used a wall of generated images of llamas, the emblematic animal from Perú. So what better subject to talk about with Llama 2 Chat than llamas?

(And before writing code, I can’t resist sharing two photos of llamas I took during my visit to Machu Picchu)

To get started with a new model on Bedrock, I first navigate to Amazon Bedrock on the console. I select Model access on the bottom left pane, then select the Edit button on the top right side, and enable access to the Llama 2 Chat model.

In the left navigation bar, under Playgrounds, I select Chat to interact with the model without writing any code.

Now that I know I can access the model, I open a code editor on my laptop. I assume you have the AWS Command Line Interface (AWS CLI) configured, which will allow the AWS SDK to locate your AWS credentials. I use Python for this demo, but I want to show that Bedrock can be called from any language. I also share a public gist with the same code sample written in the Swift programming language.

Returning to Python, I first run the ListFoundationModels API call to discover the modelId for Llama 2 Chat 13B.

import boto3

bedrock = boto3.client(service_name='bedrock', region_name='us-east-1')
listModels = bedrock.list_foundation_models(byProvider='meta')
print("\n".join(list(map(lambda x: f"{x['modelName']} : { x['modelId'] }", listModels['modelSummaries']))))

Running this code produces the list:

Llama 2 Chat 13B : meta.llama2-13b-chat-v1

I select meta-llama2-chat-13b model ID and write the code to send a prompt to the LLama 2 Chat 13B model.

import boto3
import json

llamaModelId = 'meta.llama2-13b-chat-v1' 
prompt = "What is the difference between a llama and an alpaca?"

llamaPayload = json.dumps({ 
	'prompt': prompt,
    'max_gen_len': 512,
	'top_p': 0.9,
	'temperature': 0.2
})

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime', 
    region_name='us-east-1'
)
response = bedrock_runtime.invoke_model(
    body=llamaPayload, 
    modelId=llamaModelId, 
    accept='application/json', 
    contentType='application/json'
)
body = response.get('body').read().decode('utf-8')
response_body = json.loads(body)
print(response_body['generation'].strip())

The response is printed:

Llamas and alpacas are both members of the camelid family, 
but they are different species with distinct physical and behavioral characteristics. 
Here are some of the main differences: (...)

I redacted the answer for brevity.

You can tune the different request parameters to see how they influence the response.

Optimizing your prompts
To get the best results out of the Llama 2 Chat model, you can experiment with a couple of prompt engineering techniques, such as using roles or chain-of-thought prompts.

Using roles, you tell the model how to behave by giving it a role and instructions before asking your question. For example:

Prompt: You are a teacher. You explain things for six-year old kids.
What is the difference between a llama and an alpaca?

Answer: Great question! Both llamas and alpacas are animals that live in South America, and they look a little bit similar. But there are some differences that we can talk about.
First, let's talk about their fur. Llamas have longer fur than alpacas. It's like they have a big, fluffy coat that helps them stay warm in the cold weather. Alpacas, on the other hand, have shorter fur that's a little bit smoother.
Another difference is their ears. Llamas have banana-shaped ears that are really big, while alpacas have small, triangular ears.
Finally, let's talk about their size. Llamas are generally bigger than alpacas. They can grow up to 6 feet tall and weigh a lot, while alpacas are usually smaller, only growing up to 3 feet tall.
So, those are some of the main differences between llamas and alpacas. They're both really cool animals, and it's fun to learn about them!

With chain-of-thought prompts, you ask the model to explain each step that led to the answer. For example:

Prompt: You are a a very intelligent bot with exceptional critical thinking.
I went to the market and bought 10 apples. I gave 2 apples to your friend and 2 to the helper. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.

Answer:
1. You went to the market and bought 10 apples.
2. You gave 2 apples to your friend.
3. You gave 2 apples to the helper.
4. You went back to the market and bought 5 more apples.
5. You ate 1 apple.
Now, let's count how many apples you have:
10 apples (from the first trip to the market) - 2 apples (given to your friend) - 2 apples (given to the helper) = 6 apples
6 apples + 5 apples (from the second trip to the market) = 11 apples
11 apples - 1 apple (that you ate) = 10 apples
Therefore, you remain with 10 apples.
I hope this helps! Do you have any more questions or would you like to play again?

Meta shared more examples in section 4.2 of their getting to know Llama notebook.

Availability
The Llama 2 Chat model is available today for all AWS customers in two of the AWS Regions where Bedrock is available: US East (N. Virginia) and US West (Oregon).

You will be charged for model inference. You can choose to be charged on a pay-as-you-go basis, with no upfront or recurring fees; AWS charges per processed input and output tokens. Or you can provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment. The Bedrock pricing page has the details.

With this information, you’re prepared to start using Amazon Bedrock and the Llama 2 Chat model in your applications.

Go build!

-- seb

AWS News Blog

Amazon Bedrock now provides access to Meta’s Llama 2 Chat 13B model

Resources

Follow