13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Large Scale Hierarchical Neural Network Language Models

Hong-Kwang Kuo (1), Ebru Arısoy (1), Ahmad Emami (1), Paul Vozila (2)

(1) IBM T.J. Watson Research Center Yorktown Heights, NY, USA
(2) Nuance Communications, One Wayside Road, Burlington, MA, USA

Feed-forward neural network language models (NNLMs) are known to improve both perplexity and word error rate performance for speech recognition compared with conventional n-gram language models. We present experimental results showing how much the WER can be improved by increasing the scale of the NNLM, in terms of model size and training data. However, training time can become very long. We implemented a hierarchical NNLM approximation to speed up the training, through splitting up events and parallelizing training as well as reducing the output vocabulary size of each sub-network. The training time was reduced by about 20 times, e.g. from 50 days to 2 days, with no degradation in WER. Using English Broadcast News data (350M words), we obtained significant improvements over the baseline n-gram language model, competitive with recently published recurrent neural network language model (RNNLM) results.

Index Terms: neural network language models

Full Paper

Bibliographic reference.  Kuo, Hong-Kwang / Arısoy, Ebru / Emami, Ahmad / Vozila, Paul (2012): "Large scale hierarchical neural network language models", In INTERSPEECH-2012, 1672-1675.