Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation

Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu

Response obligation detection, which determines whether a dialogue robot has to respond to a detected utterance, is an important function for intelligent dialogue robots. Some studies have tackled this problem; however, they narrow their applicability by impractical assumptions or use of scenario-specific features. Some attempts have been made to widen the applicability by avoiding the use of text modality, which is said to be highly domain dependent, but it decreases the detection accuracy. In this paper, we propose a novel multimodal response obligation detector, which uses visual, audio, and text information for highly-accurate detection, with its unsupervised online domain adaptation to solve the domain dependency problem. Our domain adaptation consists of the weights adaptation of the logistic regression for every modality and an embedding assignment for new words to cope with the high domain dependency of text modality. Experimental results on the dataset collected at a station and commercial building showed that our method achieved high response obligation detection accuracy and was able to handle domain change automatically.

 DOI: 10.21437/Interspeech.2019-1313

Cite as: Horiguchi, S., Kanda, N., Nagamatsu, K. (2019) Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation. Proc. Interspeech 2019, 4180-4184, DOI: 10.21437/Interspeech.2019-1313.

  author={Shota Horiguchi and Naoyuki Kanda and Kenji Nagamatsu},
  title={{Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation}},
  booktitle={Proc. Interspeech 2019},