Neural Whispered Speech Detection with Imbalanced Learning

Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono

In this paper, we present a neural whispered-speech detection technique that offers utterance-level classification of whispered and non-whispered speech exhibiting imbalanced data distributions. Previous studies have shown that machine learning models trained on a large amount of whispered and non-whispered utterances perform remarkably well for whispered speech detection. However, it is often difficult to collect large numbers of whispered utterances. In this paper, we propose a method to train neural whispered speech detectors from a small amount of whispered utterances in combination with a large amount of non-whispered utterances. In doing so, special care is taken to ensure that severely imbalanced datasets can effectively train neural networks. Specifically, we use a class-aware sampling method for training neural networks. To evaluate the networks, we gather test samples recorded by both condenser and smartphone microphones at different distances from the speakers to simulate practical environments. Experiments show the importance of imbalanced learning in enhancing the performance of utterance level classifiers.

 DOI: 10.21437/Interspeech.2019-2161

Cite as: Ashihara, T., Shinohara, Y., Sato, H., Moriya, T., Matsui, K., Fukutomi, T., Yamaguchi, Y., Aono, Y. (2019) Neural Whispered Speech Detection with Imbalanced Learning. Proc. Interspeech 2019, 3352-3356, DOI: 10.21437/Interspeech.2019-2161.

  author={Takanori Ashihara and Yusuke Shinohara and Hiroshi Sato and Takafumi Moriya and Kiyoaki Matsui and Takaaki Fukutomi and Yoshikazu Yamaguchi and Yushi Aono},
  title={{Neural Whispered Speech Detection with Imbalanced Learning}},
  booktitle={Proc. Interspeech 2019},