Posted On: Mar 23, 2021

Today we are announcing new Hugging Face integrations with Amazon SageMaker to help data scientists develop, train, and tune state-of-the-art natural language (NLP) models more quickly and easily.

The field of natural language processing, which drives use cases like chat bots, sentiment analysis, question answering, and search has experienced a renaissance over the past few years. In particular, the Transformer deep learning architecture has been responsible for some of the largest state-of-the-art models to date such as T5 and GPT-3. However, given their size, training and optimizing NLP models requires time, resources, and skill. Since 2016, Hugging Face has been a leader in the NLP community thanks to their transformers library which features over 7,000 pre-trained models in 164 languages, making it easier for developers to get started. With over 41,000 GitHub stars and over 25 million downloads, the transformers library has become the de facto place for developers and data scientists to find NLP models.  

The Hugging Face AWS Deep Learning Container (DLC) and Hugging Face estimator in the Amazon SageMaker Python SDK further extend the ease with which developers and data scientists can get started with NLP on AWS. The Hugging Face DLC contains the Hugging Face transformers, datasets, and tokenizers libraries optimized for SageMaker to take advantage of the SageMaker distributed training libraries, and the Hugging Face estimator enables developers and data scientists to run NLP scripts as SageMaker training jobs with minimal additional code. Hugging Face developers can now more easily develop on Amazon SageMaker and inherit from its benefits including cost-efficiency, scalability, production-readiness and high security bar.  

The Hugging Face DLC and SageMaker SDK are available in all regions where Amazon SageMaker is available and come at no additional cost. Read the launch blog or documentation to learn more, or access the sample notebooks to try out the new integrations.