AI›
Hugging Face on AWS

Hugging Face on AWS

Train and deploy Hugging Face models in minutes with Amazon SageMaker, AWS Trainium, and AWS Inferentia

Get started with Hugging Face on AWS

Overview

With Hugging Face on AWS, you can access, evaluate, customize, and deploy hundreds of publicly available foundation models (FMs) through Amazon SageMaker on NVIDIA GPUs, as well as purpose-built AI chips AWS Trainium and AWS Inferentia, in a matter of clicks. These easy-to-use flows which are supported on the most popular FMs in the Hugging Face model hub allow you to further optimize the performance of their models for their specific use cases while significantly lowering costs. Code snippets for Sagemaker are available on every model page on the model hub under the Train and Deploy dropdown menus.

Behind the scenes, these experiences are built on top of the Hugging Face AWS Deep Learning Containers (DLCs), which provide you a fully managed experience for building, training, and deploying state-of-the-art FMs using Amazon SageMaker. These DLCs remove the need to package dependencies and optimize your ML workload for the targeted hardware. For example, AWS and Hugging Face collaborate on the open-source Optimum Neuron library which is packaged in the DLCs built for AWS AI chips to deliver price performance benefits with minimal overhead.

Benefits

Access and deploy publicly available FMs with just a few clicks

Hugging Face offers a wide array of pre-trained FMs such as Meta Llama 3, Mistral, Falcon 2, and Starcoder that you can securely access and deploy via Amazon SageMaker JumpStart on AWS Trainium, AWS Inferentia, and NVIDIA GPUs with just a few clicks. SageMaker also provides enhanced security by allowing you to use your virtual private cloud (VPC) and deploy FMs in network isolation.

Maximize performance while lowering costs with AWS AI chips

Get high performance with the broadest set of accelerated EC2 instances and support for popular frameworks such as PyTorch, TensorFlow, and JAX. AWS Trainium can help you lower training costs by up to 50% and AWS Inferentia2 can lower inference costs by up to 40% over comparable EC2 instances.

Customize FMs with advanced techniques for your use case

Using Amazon SageMaker, you can customize publicly available models with advanced techniques to improve model quality for specific tasks and enable production workloads at scale. You can leverage techniques such as prompt engineering, retrieval augmented generation (RAG), and fine-tuning techniques including parameter efficient fine tuning (PEFT), low rank adaptation (LoRA), reinforcement learning with human feedback (RLHF), and supervised fine tuning (SFT).

Accelerate innovation with purpose-built ML tools

Take advantage of Amazon SageMaker’s purpose-built tools for every step of the FM development lifecycle. With Amazon SageMaker, you can evaluate, deeply customize, and deploy models with optimized performance, latency, and cost. You can deploy FMs in real-time or asynchronously, and use multi-model endpoints and other advanced deployment techniques to have full control on cost and performance. Hugging Face Text Generation Inference (TGI), the advanced serving stack for deploying and serving large language models (LLMs), supports NVIDIA GPUs as well as Inferentia2 on SageMaker, so you can optimize for higher throughput and lower latency, while reducing costs.

Use cases

Content summarization

Produce concise summaries of articles, blog posts, and documents to identify the most important information, highlight key takeaways, and more quickly distill information. Hugging Face provides a variety of models for content summarization, including Meta Llama 3.

Chat support or virtual assistants

Streamline customer self-service processes and reduce operational costs by automating responses for customer service queries through generative AI-powered chat support and virtual assistants. Hugging Face provides models that can be used for chat support or virtual assistants, including instruction-tuned Meta Llama 3 and Falcon 2 models.

Content generation

Create personalized, engaging, and high-quality content, such as short stories, essays, blogs, social media posts, images, and web page copy. Hugging Face provides models for content generation, including Mistral.

Code generation

Accelerate application development with code suggestions. Hugging Face provides models that can be used code generation, including StarCoder.

Document vectorization

By vectorizing documents with embedding models, you unlock powerful capabilities for information retrieval, question answering, semantic search, contextual recommendations, and document clustering. These applications enhance the way users interact with information, making it easier to discover, explore, and leverage relevant knowledge from large document collections.

Videos

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

SageMaker JumpStart: deploy Hugging Face models in minutes!

Deep Dive: Hugging Face models on AWS AI Accelerators

Resources

Build a Hugging Face Text Classification model in Amazon SageMaker Jumpstart

AWS Inferentia and AWS Trainium deliver lowest cost to deploy Llama 3 models in Amazon SageMaker JumpStart

Enabling over 100,000 models on AWS Inferentia2 with Amazon SageMaker

Hugging Face Text Generation Inference on AWS Inferentia2

Deploy Meta Llama 3 70B on AWS Inferentia2 with Hugging Face Optimum Neuron

AWS and Hugging Face collaborate to make generative AI more accessible and cost efficient

Hugging Face Embedding container for Amazon SageMaker

Meta Llama 3 8B

Get started with Hugging Face on AWS

Getting started

Get started with Hugging Face in Amazon SageMaker JumpStart

Get started with Hugging Face Optimum Neuron

View the tutorial