Future Context Attention for Unidirectional LSTM Based Acoustic Model

Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai

Recently, feedforward sequential memory networks (FSMN) has shown strong ability to model past and future long-term dependency in speech signals without using recurrent feedback, and has achieved better performance than BLSTM in acoustic modeling. However, the encoding coefficients in FSMN is context-independent while context-dependent weights are commonly supposed to be more reasonable in acoustic modeling. In this paper, we propose a novel architecture called attention-based LSTM, which employs context-dependent scores or context-dependent weights to encode temporal future context information with the help of a kind of attention mechanism for unidirectional LSTM based acoustic model. Preliminary experimental results on TIMIT corpus have shown that the proposed attention-based LSTM achieves a phone error rate (PER) of 20.8% while PER is 20.1% for BLSTM. We have also presented a lot of experiments to evaluate different context attention methods.

DOI: 10.21437/Interspeech.2016-185

Cite as

Tang, J., Zhang, S., Wei, S., Dai, L. (2016) Future Context Attention for Unidirectional LSTM Based Acoustic Model. Proc. Interspeech 2016, 3394-3398.

author={Jian Tang and Shiliang Zhang and Si Wei and Li-Rong Dai},
title={Future Context Attention for Unidirectional LSTM Based Acoustic Model},
booktitle={Interspeech 2016},