Prediction and Generation of Backchannel Form for Attentive Listening Systems

Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel Ward

In human-human dialogue, especially in attentive listening such as counseling, backchannels are important not only for smooth communication but also for establishing rapport. Despite several studies on when to backchannel, most of the current spoken dialogue systems generate the same pattern of backchannels, giving monotonous impressions to users. In this work, we investigate generation of a variety of backchannel forms according to the dialogue context. We first show the feasibility of choosing appropriate backchannel forms based on machine learning, and the synergy of using linguistic and prosodic features. For generation of backchannels, a framework based on a set of binary classifiers is adopted to effectively make a “not-to-generate” decision. The proposed model achieved better prediction accuracy than a baseline which always outputs the same backchannel form and another baseline which randomly generates backchannels. Finally, evaluations by human subjects demonstrate that the proposed method generates backchannels as naturally as human choices, giving impressions of understanding and empathy.

DOI: 10.21437/Interspeech.2016-118

Cite as

Kawahara, T., Yamaguchi, T., Inoue, K., Takanashi, K., Ward, N. (2016) Prediction and Generation of Backchannel Form for Attentive Listening Systems. Proc. Interspeech 2016, 2890-2894.

author={Tatsuya Kawahara and Takashi Yamaguchi and Koji Inoue and Katsuya Takanashi and Nigel Ward},
title={Prediction and Generation of Backchannel Form for Attentive Listening Systems},
booktitle={Interspeech 2016},