ASR Error Correction with Augmented Transformer for Entity Retrieval

Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu

Domain-agnostic Automatic Speech Recognition (ASR) systems suffer from the issue of mistranscribing domain-specific words, which leads to failures in downstream tasks. In this paper, we present a post-editing ASR error correction method using the Transformer model for entity mention correction and retrieval. Specifically, we propose a novel augmented variant of the Transformer model that encodes both the word and phoneme sequence of an entity, and attends to phoneme information in addition to word-level information during decoding to correct mistranscribed named entities. We evaluate our method on both the ASR error correction task and the downstream retrieval task. Our method achieves 48.08% entity error rate (EER) reduction in ASR error correction task and 26.74% mean reciprocal rank (MRR) improvement for the retrieval task. In addition, our augmented Transformer model significantly outperforms the vanilla Transformer model with 17.89% EER reduction and 1.98% MRR increase, demonstrating the effectiveness of incorporating phoneme information in the correction model.

 DOI: 10.21437/Interspeech.2020-1753

Cite as: Wang, H., Dong, S., Liu, Y., Logan, J., Agrawal, A.K., Liu, Y. (2020) ASR Error Correction with Augmented Transformer for Entity Retrieval. Proc. Interspeech 2020, 1550-1554, DOI: 10.21437/Interspeech.2020-1753.

  author={Haoyu Wang and Shuyan Dong and Yue Liu and James Logan and Ashish Kumar Agrawal and Yang Liu},
  title={{ASR Error Correction with Augmented Transformer for Entity Retrieval}},
  booktitle={Proc. Interspeech 2020},