Artificial Intelligence

Moises Hernandez Fernandez

Author: Moises Hernandez Fernandez

BERT inference on G4 instances using Apache MXNet and GluonNLP: 1 million requests for 20 cents

Bidirectional Encoder Representations from Transformers (BERT) [1] has become one of the most popular models for natural language processing (NLP) applications. BERT can outperform other models in several NLP tasks, including question answering and sentence classification. Training the BERT model on large datasets is expensive and time consuming, and achieving low latency when performing inference […]