DeePhi Descartes Efficient Speech Recognition Engine
DeePhi Descartes Efficient Speech Recognition Engine
Product Overview
This is an end-to-end ASR (Automatic Speech Recognition) system with FPGA acceleration on AWS F1 by DeePhi. We modify the Baidu DeepSpeech2 framework (https://github.com/SeanNaren/deepspeech.pytorch) for our solution of algorithm, software and hardware co-design, using LibriSpeech 1000h dataset (http://www.openslr.org/12/) for model training and compression. Our model consists of 2 convolution layers (with Batch Normalization and Hardtanh), 5 bi-directional LSTM layers and 1 fully connected layer, together with a Softmax layer. We mainly focus on the acceleration of CNN and LSTM layers by FPGA, while other parts are implemented on CPU. For a test audio of 1 second, we are able to achieve a latency of 20.59ms for the entire end-to-end ASR system on AWS F1 with the help of our acceleration, which is about 2.06X speedup compared to cudnn solution tested locally on GPU P4. Users could run the test scripts for both performance comparisons of CPU/FPGA and single sentence recognition.
Version
By
DeePhi TechCategories
Operating System
Linux/Unix, CentOS 3.10.0-693.2.2.el7.x86_64
Delivery Methods