Forward-Backward Attention Decoder

Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

This paper investigates how forward and backward attentions can be integrated to improve the performance of attention-based sequence-to-sequence (seq2seq) speech recognition systems. In the proposed approach, speech is decoded from left to right as well as from right to left utilizing forward and backward attention vectors and the best sentence hypothesis is searched for according to combined probabilities provided by the decoders of two directions. Our method takes advantage of two distinct and complementary ways of extracting information from the asymmetric time structure of speech. It also mitigates a drawback of attention-based models that they tend to output less reliable labels due to error accumulation when the utterance becomes longer. We also show the effectiveness of a multitask learning in which the forward decoder is jointly trained with backward decoding sharing a single encoder. The proposed forward-backward decoding improved word error rates (WERs) of word-level attention models by up to 12.7% relative in speech recognition experiments using large-scale spontaneous speech corpora. They achieve much higher performances than a state-of-the-art hybrid DNN-HMM system while retaining the advantage of very low latency.

 DOI: 10.21437/Interspeech.2018-1160

Cite as: Mimura, M., Sakai, S., Kawahara, T. (2018) Forward-Backward Attention Decoder. Proc. Interspeech 2018, 2232-2236, DOI: 10.21437/Interspeech.2018-1160.

  author={Masato Mimura and Shinsuke Sakai and Tatsuya Kawahara},
  title={Forward-Backward Attention Decoder},
  booktitle={Proc. Interspeech 2018},