Machine Learning / Inferentia / HF BERT
Hugging Face BERT on AWS Inferentia
Get the highest performance at the lowest cost on Hugging Face BERT inference
Hugging Face is a leading repository for BERT-based NLP models, which are a common foundation for many NLP applications. As more companies deploy Hugging Face BERT models into production, they are faced with cost, performance, and time to market challenges. Amazon EC2 Inf1 instances, powered by AWS Inferentia are purpose built for deep learning inference and are ideal for BERT models.
-
Save up to 70% on cost per inference
Inf1 instances deliver up to 70% lower inference costs than comparable GPU-based EC2 instances for many natural language processing applications such as text classification, language translation, sentiment analysis, and conversational AI.
-
Deploy easily with a few lines of code
With support for Hugging Face models in the Neuron SDK, you can easily compile and run inference using pre-trained or fine-tuned transformer models with just a few lines of code. Inf1 instances support popular ML frameworks such as PyTorch and TensorFlow.
-
Enjoy up to 2.3x higher throughput
Inf1 instances deliver up to 2.3x higher throughput than comparable GPU-based Amazon EC2 instances. Inf1 instances are optimized for inference performance for small batch sizes, enabling real-time applications to maximize throughput and meet latency requirements.

Bert-Base numbers derived from Nvidia Performance Page PyTorch 1.9, seq =128, FP16
Customer Stories

Adevinta found that AWS Inferentia offered a reduction of up to 92% in prediction latency, at 75% lower cost compared to best initial alternatives when deploying Hugging Face BERT models. “It was, in other words, like having the best of GPU power at CPU cost.”

Amazon Advertising found running BERT models on Inferentia, decreased latency by 30%, and costs by 71%. “...the performance with AWS Inferentia was so impressive that I actually had to re-run the benchmarks to make sure they were correct!”

Sprinklr is able to deploy a [BERT] model using Inf1 Instances in under 2 weeks. With ample resources and support, it found migration to Inf1 simple. “The support from AWS helps us boost our customer satisfaction and staff productivity.”