Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems

Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara

We address character expression for spoken dialogue systems (e.g. extrovert). While conventional studies focused on controlling linguistic expressions, we focus on spoken dialogue behaviors. Specifically, the proposed model maps three character traits: extroversion, emotional instability, and politeness to four spoken dialogue behaviors: utterance amount, backchannel, filler, and switching pause length. It is costly to collect annotated data for training this kind of models. Therefore, we propose a semi-supervised learning approach to utilize not only a character impression data (labeled data) but also a corpus data (unlabeled data). Experimental results show that the proposed model expresses the target character traits through the behaviors more precisely than a baseline model that corresponds to the case of supervised learning only. Besides, we also investigate how to model unlabeled behavior (e.g. speech rate) by utilizing the advantage of semi-supervised learning.

 DOI: 10.21437/Interspeech.2020-2293

Cite as: Yamamoto, K., Inoue, K., Kawahara, T. (2020) Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems. Proc. Interspeech 2020, 4188-4192, DOI: 10.21437/Interspeech.2020-2293.

  author={Kenta Yamamoto and Koji Inoue and Tatsuya Kawahara},
  title={{Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems}},
  booktitle={Proc. Interspeech 2020},