AWS News Blog
Amazon Bedrock now provides access to Meta’s Llama 2 Chat 13B model
|
Update: November 29, 2023 — Today, we’re adding the Llama 2 70B model in Amazon Bedrock, in addition to the already available Llama 2 13B model. As its name implies, the Llama 2 70B model has been trained on larger datasets than the Llama 2 13B model. If you’re wondering when to use which model, consider using Llama 13B for smaller-scale tasks such as text classification, sentiment analysis, and language translation, and Llama 2 70B for large-scale tasks such as language modeling, text generation, and dialogue systems. According to Meta, Llama 2 70B’s training took 1,720,320 GPU-hours, the equivalent of 196.38 years. Start using the Llama 2 70B model in Amazon Bedrock today. We’re excited to see what you build with these models.
—
Today, we’re announcing the availability of Meta’s Llama 2 Chat 13B large language model (LLM) on Amazon Bedrock. With this launch, Amazon Bedrock becomes the first public cloud service to offer a fully managed API for Llama 2, Meta’s next-generation LLM. Now, organizations of all sizes can access Llama 2 Chat models on Amazon Bedrock without having to manage the underlying infrastructure. This is a step change in accessibility.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Stability AI, Amazon, and now Meta, along with a broad set of capabilities to build generative AI applications, simplifying the development while maintaining privacy and security. You can read more about Amazon Bedrock in Antje’s post here.
Llama 2 is a family of publicly available LLMs by Meta. The Llama 2 base model was pre-trained on 2 trillion tokens from online public data sources. According to Meta, the training of Llama 2 13B consumed 184,320 GPU/hour. That’s the equivalent of 21.04 years of a single GPU, not accounting for bissextile years.
Built on top of the base model, the Llama 2 Chat model is optimized for dialog use cases. It is fine-tuned with over 1 million human annotations (a technique known as reinforcement learning from human feedback or RLHF) and has undergone testing by Meta to identify performance gaps and mitigate potentially problematic responses in chat use cases, such as offensive or inappropriate responses.
To promote a responsible, collaborative AI innovation ecosystem, Meta established a range of resources for all who use Llama 2: individuals, creators, developers, researchers, academics, and businesses of any size. In particular, I like the Meta Responsible Use Guide, a resource for developers that provides best practices and considerations for building products powered by LLMs in a responsible manner, covering various stages of development from inception to deployment. This guide fits well in the set of AWS tools and resources to build AI responsibly.
You can now integrate the LLama 2 Chat model in your applications written in any programming language by calling the Amazon Bedrock API or using the AWS SDKs or the AWS Command Line Interface (AWS CLI).
Llama 2 Chat in action
Those of you who read the AWS News blog regularly know we like to show you the technologies we write about. So let’s write code to interact with Llama2.
I was lucky enough to talk at the AWS UG Perú Conf a few weeks ago. Jeff and Marcia were there too. Jeff opened the conference with an inspiring talk about generative AI, and he used a wall of generated images of llamas, the emblematic animal from Perú. So what better subject to talk about with Llama 2 Chat than llamas?
(And before writing code, I can’t resist sharing two photos of llamas I took during my visit to Machu Picchu)
![]() |
![]() |
To get started with a new model on Bedrock, I first navigate to Amazon Bedrock on the console. I select Model access on the bottom left pane, then select the Edit button on the top right side, and enable access to the Llama 2 Chat model.
In the left navigation bar, under Playgrounds, I select Chat to interact with the model without writing any code.
Now that I know I can access the model, I open a code editor on my laptop. I assume you have the AWS Command Line Interface (AWS CLI) configured, which will allow the AWS SDK to locate your AWS credentials. I use Python for this demo, but I want to show that Bedrock can be called from any language. I also share a public gist with the same code sample written in the Swift programming language.
Returning to Python, I first run the ListFoundationModels API call to discover the modelId
for Llama 2 Chat 13B.
Running this code produces the list:
I select meta-llama2-chat-13b
model ID and write the code to send a prompt to the LLama 2 Chat 13B model.
The response is printed:
I redacted the answer for brevity.
You can tune the different request parameters to see how they influence the response.
Optimizing your prompts
To get the best results out of the Llama 2 Chat model, you can experiment with a couple of prompt engineering techniques, such as using roles or chain-of-thought prompts.
Using roles, you tell the model how to behave by giving it a role and instructions before asking your question. For example:
With chain-of-thought prompts, you ask the model to explain each step that led to the answer. For example:
Meta shared more examples in section 4.2 of their getting to know Llama notebook.
Availability
The Llama 2 Chat model is available today for all AWS customers in two of the AWS Regions where Bedrock is available: US East (N. Virginia) and US West (Oregon).
You will be charged for model inference. You can choose to be charged on a pay-as-you-go basis, with no upfront or recurring fees; AWS charges per processed input and output tokens. Or you can provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment. The Bedrock pricing page has the details.
With this information, you’re prepared to start using Amazon Bedrock and the Llama 2 Chat model in your applications.
-- seb