AWS Machine Learning Blog
Llama 3.2 models from Meta are now available in Amazon SageMaker JumpStart
Today, we are excited to announce the availability of Llama 3.2 models in Amazon SageMaker JumpStart. Llama 3.2 offers multi-modal vision and lightweight models representing Meta’s latest advancement in large language models (LLMs), providing enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, these new models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features that help you build a new generation of AI experiences. SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML.
In this post, we show how you can discover and deploy the Llama 3.2 11B Vision model using SageMaker JumpStart. We also share the supported instance types and context for all the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this blog, you can also use the lightweight models along with fine-tuning using SageMaker JumpStart.
Llama 3.2 models are available in all regions where SageMaker JumpStart is available. Please note that Meta has restrictions on your usage of the multi-modal models if you are located in the European Union. See Meta’s community license agreement for more details.
Llama 3.2 overview
Llama 3.2 represents Meta’s latest advancement in LLMs. Llama 3.2 models are offered in various sizes, from small and medium-sized multi-modal models. The larger Llama 3.2 models come in two parameter sizes—11B and 90B—with 128,000 context length, and are capable of sophisticated reasoning tasks including multi-modal support for high resolution images. The lightweight text-only models come in two parameter sizes—1B and 3B—with 128,000 context length, and are suitable for edge devices. Additionally, there is a new safeguard Llama Guard 3 11B Vision parameter model, which is designed to support responsible innovation and system-level safety.
Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. With a focus on responsible innovation and system-level safety, Llama 3.2 models help you build and deploy cutting-edge generative AI models to ignite new innovations like image reasoning and are also more accessible for on-edge applications. The new models are also designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications.
SageMaker JumpStart overview
SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker, including SageMaker Inference for deploying models and container logs for improved observability. With SageMaker, you can streamline the entire model deployment process.
Prerequisites
To try out the Llama 3.2 models in SageMaker JumpStart, you need the following prerequisites:
- An AWS account that will contain all your AWS resources.
- An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
- Access to Amazon SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
- Access to accelerated instances (GPUs) for hosting the LLMs.
Discover Llama 3.2 models in SageMaker JumpStart
SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This provides multiple options to discover and use hundreds of models for your specific use case.
SageMaker Studio is a comprehensive IDE that offers a unified, web-based interface for performing all aspects of the ML development lifecycle. From preparing data to building, training, and deploying models, SageMaker Studio provides purpose-built tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to discover and explore the extensive catalog of FMs available for deployment to inference capabilities on SageMaker Inference.
In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane or by choosing JumpStart from the Home page.
Alternatively, you can use the SageMaker Python SDK to programmatically access and use SageMaker JumpStart models. This approach allows for greater flexibility and integration with existing AI/ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI/ML development efforts, regardless of your preferred interface or workflow.
Deploy Llama 3.2 multi-modality models for inference using SageMaker JumpStart
On the SageMaker JumpStart landing page, you can discover all public pre-trained models offered by SageMaker. You can choose the Meta model provider tab to discover all the Meta models available in SageMaker.
If you’re using SageMaker Classic Studio and don’t see the Llama 3.2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.
You can choose the model card to view details about the model such as license, data used to train, and how to use. You can also find two buttons, Deploy and Open Notebook, which help you use the model.
When you choose either button, a pop-up window will show the End-User License Agreement (EULA) and acceptable use policy for you to accept.
Upon acceptance, you can proceed to the next step to use the model.
Deploy Llama 3.2 11B Vision model for inference using the Python SDK
When you choose Deploy and accept the terms, model deployment will start. Alternatively, you can deploy through the example notebook by choosing Open Notebook. The notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using a notebook, you start by selecting an appropriate model, specified by the model_id
. You can deploy any of the selected models on SageMaker.
You can deploy a Llama 3.2 11B Vision model using SageMaker JumpStart with the following SageMaker Python SDK code:
This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To successfully deploy the model, you must manually set accept_eula=True
as a deploy method argument. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
Recommended instances and benchmark
The following table lists all the Llama 3.2 models available in SageMaker JumpStart along with the model_id
, default instance types, and the maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models. For increased context length, you can modify the default instance type in the SageMaker JumpStart UI.
Model Name | Model ID | Default instance type | Supported instance types |
Llama-3.2-1B | meta-textgeneration-llama-3-2-1b, meta-textgenerationneuron-llama-3-2-1b |
ml.g6.xlarge (128K context length), ml.trn1.2xlarge (128K context length) |
All g6/g5/p4/p5 instances; ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-1B-Instruct | meta-textgeneration-llama-3-2-1b-instruct, meta-textgenerationneuron-llama-3-2-1b-instruct |
ml.g6.xlarge (128K context length), ml.trn1.2xlarge (128K context length) |
All g6/g5/p4/p5 instances; ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-3B | meta-textgeneration-llama-3-2-3b, meta-textgenerationneuron-llama-3-2-3b |
ml.g6.xlarge (128K context length), ml.trn1.2xlarge (128K context length) |
All g6/g5/p4/p5 instances; ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-3B-Instruct | meta-textgeneration-llama-3-2-3b-instruct, meta-textgenerationneuron-llama-3-2-3b-instruct |
ml.g6.xlarge (128K context length), ml.trn1.2xlarge (128K context length) |
All g6/g5/p4/p5 instances; ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-11B-Vision | meta-vlm-llama-3-2-11b-vision | ml.p4d.24xlarge (128K context length) | p4d.24xlarge, p4de.24xlarge, p5.48xlarge |
Llama-3.2-11B-Vision-Instruct | meta-vlm-llama-3-2-11b-vision-instruct | ml.p4d.24xlarge (128K context length) | p4d.24xlarge, p4de.24xlarge, p5.48xlarge |
Llama-3.2-90B-Vision | meta-vlm-llama-3-2-90b-vision | ml.p5.24xlarge (128K context length) | p4d.24xlarge, p4de.24xlarge, p5.48xlarge |
Llama-3.2-90B-Vision-Instruct | meta-vlm-llama-3-2-90b-vision-instruct | ml.p5.24xlarge (128K context length) | p4d.24xlarge, p4de.24xlarge, p5.48xlarge |
Llama-Guard-3-11B-Vision | meta-vlm-llama-guard-3-11b-vision | ml.p4d.24xlarge | p4d.24xlarge, p4de.24xlarge, p5.48xlarge |
Llama 3.2 models have been evaluated on over 150 benchmark datasets, demonstrating competitive performance with leading FMs.
Inference and example prompts for Llama-3.2 11B Vision
You can use Llama 3.2 11B and 90B models for text and image or vision reasoning use cases. You can perform a variety of tasks, such as image captioning, image text retrieval, visual question answering and reasoning, document visual question answering, and more. Input payload to the endpoint looks like the following code examples.
Text-only input
The following is an example of text-only input:
This produces the following response:
Single-image input
You can set up vision-based reasoning tasks with Llama 3.2 models with SageMaker JumpStart as follows:
Let’s load an image from the open source MATH-Vision dataset:
We can structure the message object with our base64 image data:
This produces the following response:
Multi-image input
The following code is an example of multi-image input:
This produces the following response:
Clean up
To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints using the following code snippets:
Alternatively, to use the SageMaker console, complete the following steps:
- On the SageMaker console, under Inference in the navigation pane, choose Endpoints.
- Search for the embedding and text generation endpoints.
- On the endpoint details page, choose Delete.
- Choose Delete again to confirm.
Conclusion
In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including Meta’s most advanced and capable models to date. Get started with SageMaker JumpStart and Llama 3.2 models today. For more information about SageMaker JumpStart, see Train, deploy, and evaluate pretrained models with SageMaker JumpStart and Getting started with Amazon SageMaker JumpStart.
About the Authors
Supriya Puragundla is a Senior Solutions Architect at AWS
Armando Diaz is a Solutions Architect at AWS
Sharon Yu is a Software Development Engineer at AWS
Siddharth Venkatesan is a Software Development Engineer at AWS
Tony Lian is a Software Engineer at AWS
Evan Kravitz is a Software Development Engineer at AWS
Jonathan Guinegagne is a Senior Software Engineer at AWS
Tyler Osterberg is a Software Engineer at AWS
Sindhu Vahini Somasundaram is a Software Development Engineer at AWS
Hemant Singh is an Applied Scientist at AWS
Xin Huang is a Senior Applied Scientist at AWS
Adriana Simmons is a Senior Product Marketing Manager at AWS
June Won is a Senior Product Manager at AWS
Karl Albertsen is a Head of ML Algorithm and JumpStart at AWS