Artificial Intelligence
BERT inference on G4 instances using Apache MXNet and GluonNLP: 1 million requests for 20 cents
Bidirectional Encoder Representations from Transformers (BERT) [1] has become one of the most popular models for natural language processing (NLP) applications. BERT can outperform other models in several NLP tasks, including question answering and sentence classification. Training the BERT model on large datasets is expensive and time consuming, and achieving low latency when performing inference […]
