Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition

Yu Zhang, Pengyuan Zhang, Yonghong Yan

Distant speech recognition is a highly challenging task due to background noise, reverberation, and speech overlap. Recently, there has been an increasing focus on attention mechanism. In this paper, we explore the attention mechanism embedded within the long short-term memory (LSTM) based acoustic model for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). Furthermore, multi-task learning architecture is incorporated to improve robustness in which the network is trained to perform both a primary senone classification task and a secondary feature enhancement task. Experiments were conducted on the AMI meeting corpus. On average our model achieved 3.3% and 5.0% relative improvements in word error rate (WER) over the LSTM baseline model in the SDM and MDM cases, respectively. In addition, the model provided between a 2–4% absolute WER reduction compared to a conventional pipeline of independent processing stage on the MDM task.

 DOI: 10.21437/Interspeech.2017-805

Cite as: Zhang, Y., Zhang, P., Yan, Y. (2017) Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition. Proc. Interspeech 2017, 3857-3861, DOI: 10.21437/Interspeech.2017-805.

  author={Yu Zhang and Pengyuan Zhang and Yonghong Yan},
  title={Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition},
  booktitle={Proc. Interspeech 2017},