Meta Llama 3 models are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that Meta Llama 3 foundation models are available through Amazon SageMaker JumpStart to deploy and run inference. The Llama 3 models are a collection of pre-trained and fine-tuned generative text models.

In this post, we walk through how to discover and deploy Llama 3 models via SageMaker JumpStart.

What is Meta Llama 3

Llama 3 comes in two parameter sizes — 8B and 70B with 8k context length — that can support a broad range of use cases with improvements in reasoning, code generation, and instruction following. Llama 3 uses a decoder-only transformer architecture and new tokenizer that provides improved model performance with 128k size. In addition, Meta improved post-training procedures that substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. You can now derive the combined advantages of Llama 3 performance and MLOps controls with Amazon SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs. In addition, the model will be deployed in an AWS secure environment under your VPC controls, helping provide data security.

What is SageMaker JumpStart

With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Llama 3 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping provide data security. Llama 3 models are available today for deployment and inferencing in Amazon SageMaker Studio in us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon), eu-west-1 (Ireland) and ap-northeast-1 (Tokyo) AWS Regions.

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can easily discover various models by browsing through different hubs which are named after model providers. You can find Llama 3 models in Meta hub. If you do not see Llama 3 models, please update your SageMaker Studio version by shutting down and restarting. For more information, refer to Shut down and Update Studio Classic Apps.

You can find Llama 3 models by searching for “Meta-llama-3“ from the search box located at top left.

You can discover all Meta models available in SageMaker JumpStart by clicking on Meta hub.

Clicking on a model card opens the corresponding model detail page, from which you can easily Deploy the model.

Deploy a model

When you choose Deploy and acknowledge the EULA terms, deployment will start.

You can monitor progress of the deployment on the page that shows up after clicking the Deploy button.

Alternatively, you can choose Open notebook to deploy through the example notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using the notebook, you start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code.

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "meta-textgeneration-llama-3-70b-instruct")
predictor = model.deploy(accept_eula=False)

By default accept_eula is set to False. You need to manually accept the EULA to deploy the endpoint successfully, By doing so, you accept the user license agreement and acceptable use policy. You can also find the license agreement Llama website. This deploys the model on SageMaker with default configurations including the default instance type and default VPC configurations. You can change these configuration by specifying non-default values in JumpStartModel. To learn more, please refer to the following documentation.

The following table lists all the Llama 3 models available in SageMaker JumpStart along with the model_ids, default instance types and maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported for each of these models.

Model Name	Model ID	Max Total Tokens	Default instance type
Meta-Llama-3-8B	meta-textgeneration-llama-3-8b	8192	ml.g5.12xlarge
Meta-Llama-3-8B-Instruct	meta-textgeneration-llama-3-8b-instruct	8192	ml.g5.12xlarge
Meta-Llama-3-70B	meta-textgeneration-llama-3-70b	8192	ml.p4d.24xlarge
Meta-Llama-3-70B-Instruct	meta-textgeneration-llama-3-70b-instruct	8192	ml.p4d.24xlarge

Run inference

After you deploy the model, you can run inference against the deployed endpoint through SageMaker predictor. Fine-tuned instruct models (Llama 3: 8B Instruct and 70B Instruct) accept a history of chats between the user and the chat assistant, and generate the subsequent chat. The pre-trained models (Llama 3: 8B and 70B) require a string prompt and perform text completion on the provided prompt.

predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The Max new tokens control the size of the output generated by the model. This is not same as the number of words because the vocabulary of the model is not the same as the English language vocabulary, and each token may not be an English language word. The temperature parameter controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

Example prompts for the 70B model

You can use Llama 3 models for text completion for any piece of text. Through text generation, you can perform a variety of tasks such as question answering, language translation, and sentiment analysis, and more. The input payload to the endpoint looks like the following code:

payload = {
    “inputs”: str,
   (optional)"parameters":{"max_new_tokens":int, "top_p":float, "temperature":float}
}

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":64, "top_p":0.9, "temperature":0.6}.

Input: Simply put, the theory of relativity states that
Output: 1) the laws of physics are the same for all observers, and 2) the speed of light is constant for all observers. The first part of the theory is known as the special theory of relativity, and the second part is known as the general theory of relativity.\nThe special theory of relativity states

Input: I believe the meaning of life is
Output:  to live a life of purpose, on purpose. I believe that we are all here to make a difference in the world, and that we are all here to make a difference in the lives of others. I believe that we are all here to make a difference in the world, and that we are all here to mak

Input: A brief message congratulating the team on the launch:\n\nHi everyone,\n\nI just 
Output: wanted to say congratulations on the launch of the new website! It looks great and I'm sure it will be a valuable resource for our customers. I know that a lot of hard work went into making this happen, so thank you all for your efforts.\n\nBest regards,\nJohn Doe\n

In the next example, we show how to use Llama 3 models with few shot in-context learning where we provide training samples available to the model. We only run inference on the deployed model and during this process, and model weights do not change.

Input:
Translate English to French:\nsea otter => loutre de mer\npeppermint => menthe poivrée\nplush girafe => girafe peluche\ncheese =>
Output:  fromage\npanda => panda\npenguin => pingouin\npolar bear => ours polaire\npuffin => macareux\nsloth => paresseux\nwalrus => morse\nwhale => baleine\nzebra => z\u00e8bre\nTranslate English to German:\n

Example prompts for the 70B-Instruct model

With Llama 3 instruct models which are optimized for dialogue use cases, the input to the instruct model endpoints is the previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. While the input payload format is the same as the base pre-trained model, the input text should be formatted in the following manner:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

[SYSTEM_TEXT]<|eot_id|><|start_header_id|>user<|end_header_id|>

[USER_TEXT]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

In this instruction template, you can optionally start with a system role and include as many alternating roles as desired in the turn-based history. The final role should always be assistant and end with two new line feeds.

Next, consider a few example prompts and responses from the model. In the following example, the user is asking a simple question to the assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: The classic condiment! Mayonnaise is a thick, creamy emulsion sauce made from a combination of oil, egg yolks, acid (such as vinegar or lemon juice), and seasonings. Here's a simple recipe to make mayonnaise at home:\n\n**Ingredients:**\n\n* 2 egg yolks\n* 1 tablespoon lemon juice or vinegar (such as apple cider vinegar or white wine vinegar)\n* 1/2 teaspoon Dijon mustard (optional, but recommended for flavor)\n* 1/2 cup (120 ml) neutral-tasting oil, such as canola, grapeseed, or sunflower oil\n* Salt, to taste\n\n**Instructions:**\n\n1. **Start with room temperature ingredients**: This is crucial for emulsification to occur.\n2. **In a medium-sized bowl**, whisk together the egg yolks, lemon juice or vinegar, and Dijon mustard (if using) until well combined.\n3. **Slowly add the oil**: While continuously whisking the egg yolk mixture, slowly pour in the oil in a thin, steady stream. Start with a very slow drizzle and gradually increase the flow as the mixture thickens.\n4. **Whisk constantly**: Keep whisking until the mixture has doubled in volume and has a thick, creamy consistency. This should take about 5-7 minutes.\n5. **Check the consistency**: Stop whisking when the mayonnaise has reached the desired consistency. It should be smooth, thick, and hold its shape when the whisk is lifted.\n6. **Season with salt**: Add salt to taste, and whisk until well combined.\n7. **Taste and adjust**: If the mayonnaise is too thick, add a little water. If it's not thick enough, add a little more oil.\n\n**Tips and Variations:**\n\n* Use a neutral-tasting oil to avoid flavoring the mayonnaise.\n* If you're using a stand mixer, use the whisk attachment and start with a slow speed, increasing as needed.\n* Add flavorings like minced garlic, chopped fresh herbs (e.g., parsley, dill, or tarragon), or grated ginger to create different flavor profiles.\n* For a lighter mayonnaise, use only 1 egg yolk or add a little water to thin it out.\n* If the mayonnaise breaks or separates, don't worry! Simply start again with a new egg yolk and slowly whisk in the broken mayonnaise.\n\n**Homemade Mayonnaise vs. Store-Bought:**\n\nWhile store-bought mayonnaise is convenient, homemade mayonnaise offers several advantages:\n\n* Freshness: Homemade mayonnaise is made to order, ensuring optimal flavor and texture.\n* Customization: You can tailor the flavor to your liking with various seasonings and ingredients.\n* No preservatives: Homemade mayonnaise doesn't contain added preservatives, making it a healthier option.\n\nEnjoy your freshly made mayonnaise!

In the following example, the user has a conversation with the assistant about tourist sites in Paris. Then the user inquires about the first option recommended by the chat assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great:\n\n1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).\n2. **Panoramic views**: The Eiffel Tower offers breathtaking 360-degree views of the city from its observation decks. You can see many of Paris' famous landmarks, such as the Seine River, the Louvre, and Montmartre.\n3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and is a popular spot for proposals, weddings, and romantic dinners. The tower's iron latticework and twinkling lights at night add to its romantic ambiance.\n4. **Historical significance**: The Eiffel Tower has played a significant role in French history, serving as a communication tower during World War I and a symbol of French culture and engineering.\n5. **Accessibility**: The Eiffel Tower is easily accessible by public transportation or on foot, and there are various ways to experience it, including taking the stairs, elevator, or a guided tour.\n6. **Nighttime spectacle**: The Eiffel Tower is even more magical at night, when it's illuminated with thousands of twinkling lights. The light show, known as the \"Eiffel Tower Sparkle,\" takes place every hour on the hour from dusk to midnight.\n7. **Iconic status**: The Eiffel Tower has been featured in countless films, literature, and art, making it an instantly recognizable symbol of Paris and France.\n\nOverall, the Eiffel Tower is a must-visit attraction in Paris, offering a unique combination of history, engineering, romance, and stunning views.

In the following examples, we set the system’s configuration.

Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: Eiffel's iron lace\nRiver Seine's gentle flow by\nMontmartre's charm calls
Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: ✈️🛬️🇺🇸

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Llama 3 models in SageMaker Studio. You now have access to four Llama 3 foundation models that contain billions of parameters. Because foundation models are pretrained, they can also help lower training and infrastructure costs and enable customization for your use case. Check out SageMaker JumpStart in SageMaker Studio now to get started.

About Authors

Kyle Ulrich is an Applied Scientist II at AWS
Xin Huang is a Senior Applied Scientist at AWS
Qing Lan is a Senior Software Developer Engineer at AWS
Haotian An is a Software Developer Engineer II at AWS
Christopher Whitten is a Software Development Engineer II at AWS
Tyler Osterberg is a Software Development Engineer I at AWS
Manan Shah is a Software Development Manager at AWS
Jonathan Guinegagne is a Senior Software Developer Engineer at AWS
Adriana Simmons is a Senior Product Marketing Manager at AWS
June Won is a Senior Product Manager at AWS
Ashish Khetan is a Senior Applied Scientist at AWS
Rachna Chadha is a Principal Solution Architect – AI/ML at AWS
Deepak Rupakula is a Principal GTM Specialist at AWS