AWS Machine Learning Blog

AWS and Hugging Face collaborate to make generative AI more accessible and cost efficient

We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision models used to create generative AI applications. Generative AI applications can perform a variety of tasks, including text summarization, answering questions, code generation, image creation, and writing essays and articles.

AWS has a deep history of innovation in generative AI. For example, Amazon uses AI to deliver a conversational experience with Alexa that customers are interacting with billions of times each week, and is increasingly using generative AI as part of new experiences like Create with Alexa. In addition, M5 a group within Amazon Search that helps teams across Amazon bring large models to their applications, trained large models to improve search results on AWS is constantly innovating across all areas of ML including infrastructure, tools on Amazon SageMaker,  and AI services, such as Amazon CodeWhisperer, a service that improves developer productivity by generating code recommendations based on the code and comments in an IDE. AWS also created purpose-built ML accelerators for the training (AWS Trainium) and inference (AWS Inferentia) of large language and vision models on AWS.

Hugging Face selected AWS because it offers flexibility across state-of-the-art tools to train, fine-tune, and deploy Hugging Face models including Amazon SageMaker, AWS Trainium, and AWS Inferentia. Developers using Hugging Face can now easily optimize performance and lower cost to bring generative AI applications to production faster.

High-performance and cost-efficient generative AI

Building, training, and deploying large language and vision models is an expensive and time-consuming process that requires deep expertise in machine learning (ML). Since the models are very complex and can contain hundreds of billions of parameters, generative AI is largely out of reach for many developers.

To close this gap, Hugging Face is now collaborating with AWS to make it easier for developers to access AWS services and deploy Hugging Face models specifically for generative AI applications. The benefits are: faster training and scaling low-latency and high-throughput inference. For example, the Amazon EC2 Trn1 instances powered by AWS Trainium deliver faster time to train while offering up to 50% cost-to-train savings over comparable Amazon EC2 instances. Amazon EC2’s new Inf2 instances, powered by the latest generation of AWS Inferentia, are purpose-built to deploy the latest generation of large language and vision models and raise the performance of Inf1 by delivering up to 4x higher throughput and up to 10x lower latency. Developers can use AWS Trainium and AWS Inferentia through managed services such as Amazon SageMaker, a service with tools and workflows for ML. Or they can self-manage on Amazon EC2.

Get started today

Customers can start using Hugging Face models on AWS in three ways: through SageMaker JumpStart, the Hugging Face AWS Deep Learning Containers (DLCs),  or the tutorials to deploy your models to AWS Trainium or AWS Inferentia. The Hugging Face DLC is packed with optimized transformers, datasets, and tokenizers libraries to enable you to fine-tune and deploy generative AI applications at scale in hours instead of weeks – with minimal code changes. SageMaker JumpStart and the Hugging Face DLCs are available in all regions where Amazon SageMaker is available and come at no additional cost. Read documentation and discussion forums to learn more or try the sample notebooks today.