Towards End-to-End Modeling of Spoken Language Understanding in a Cloud-based Spoken Dialog System

Yao Qian, Rutuja Ubale, Vikram Ramanaryanan, David Suendermann-Oeft, Keelan Evanini, Patrick Lange, Eugene Tsuprun


Spoken language understanding (SLU) modules in dialog systems generally perform semantic decoding based on the hypotheses produced by automatic speech recognition (ASR) systems. However, while bootstrapping new spoken dialog applications from scratch in real user environments (that include for instance data collected using VoIP over poor internet connections), ASR performance could take a hit due to factors such as the paucity of training data and train-test data mismatch. To address this issue, this paper proposes an ASR-free end-to-end (E2E) modeling approach to SLU for a cloud-based modular spoken dialog system (SDS). We evaluate the efficiency of our approach on crowdsourced data collected from non-native English speakers interacting with a job interview-based language learning application. Experimental results show that our approach to SLU performs competently relative to the traditional baseline of ASR-hypothesis-based semantic classification and is particularly promising and relevant in situations where the ASR performs poorly.


Cite as: Qian, Y., Ubale, R., Ramanaryanan, V., Suendermann-Oeft, D., Evanini, K., Lange, P., Tsuprun, E. (2017) Towards End-to-End Modeling of Spoken Language Understanding in a Cloud-based Spoken Dialog System. Proc. SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue, 160-161.


@inproceedings{Qian2017,
  author={Yao Qian and Rutuja Ubale and Vikram Ramanaryanan and David Suendermann-Oeft and Keelan Evanini and Patrick Lange and Eugene Tsuprun},
  title={Towards End-to-End Modeling of Spoken Language Understanding in a Cloud-based Spoken Dialog System},
  year=2017,
  booktitle={Proc. SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue},
  pages={160--161}
}