Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models

Albert Zeyer, Ralf Schlüter, Hermann Ney

Online-Recognition requires the acoustic model to provide posterior probabilities after a limited time delay given the online input audio data. This necessitates unidirectional modeling and the standard solution is to use unidirectional long short-term memory (LSTM) recurrent neural networks (RNN) or feed-forward neural networks (FFNN).

It is known that bidirectional LSTMs are more powerful and perform better than unidirectional LSTMs. To demonstrate the performance difference, we start by comparing several different bidirectional and unidirectional LSTM topologies.

Furthermore, we apply a modification to bidirectional RNNs to enable online-recognition by moving a window over the input stream and perform one forwarding through the RNN on each window. Then, we combine the posteriors of each forwarding and we renormalize them. We show in experiments that the performance of this online-enabled bidirectional LSTM performs as good as the offline bidirectional LSTM and much better than the unidirectional LSTM.

DOI: 10.21437/Interspeech.2016-759

Cite as

Zeyer, A., Schlüter, R., Ney, H. (2016) Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models. Proc. Interspeech 2016, 3424-3428.

author={Albert Zeyer and Ralf Schlüter and Hermann Ney},
title={Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models},
booktitle={Interspeech 2016},