Towards an ASR Error Robust Spoken Language Understanding System

Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss

A modern Spoken Language Understanding (SLU) system usually contains two sub-systems, Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU), where ASR transforms voice signal to text form and NLU provides intent classification and slot filling from the text. In practice, such decoupled ASR/NLU design facilitates fast model iteration for both components. However, this makes downstream NLU susceptible to errors from the upstream ASR, causing significant performance degradation. Therefore, dealing with such errors is a major opportunity to improve overall SLU model performance. In this work, we first propose a general evaluation criterion that requires an ASR error robust model to perform well on both transcription and ASR hypothesis. Then robustness training techniques for both classification task and NER task are introduced. Experimental results on two datasets show that our proposed approaches improve model robustness to ASR errors for both tasks.

 DOI: 10.21437/Interspeech.2020-2844

Cite as: Ruan, W., Nechaev, Y., Chen, L., Su, C., Kiss, I. (2020) Towards an ASR Error Robust Spoken Language Understanding System. Proc. Interspeech 2020, 901-905, DOI: 10.21437/Interspeech.2020-2844.

  author={Weitong Ruan and Yaroslav Nechaev and Luoxin Chen and Chengwei Su and Imre Kiss},
  title={{Towards an ASR Error Robust Spoken Language Understanding System}},
  booktitle={Proc. Interspeech 2020},