Listen to What You Want: Neural Network-Based Universal Sound Selector

Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki


Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or suppression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source separation followed by AE classification, this is a sub-optimal way of solving the problem. Moreover, source separation usually requires knowing the maximum number of sources, which may not be practical when dealing with AEs. In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes. The proposed framework can be explicitly optimized to simultaneously select sounds from multiple desired AE classes, independently of the number of sources in the mixture. We experimentally show that the proposed method achieves promising AE sound selection performance and could be generalized to mixtures with a number of sources that are unseen during training.


 DOI: 10.21437/Interspeech.2020-2210

Cite as: Ochiai, T., Delcroix, M., Koizumi, Y., Ito, H., Kinoshita, K., Araki, S. (2020) Listen to What You Want: Neural Network-Based Universal Sound Selector. Proc. Interspeech 2020, 1441-1445, DOI: 10.21437/Interspeech.2020-2210.


@inproceedings{Ochiai2020,
  author={Tsubasa Ochiai and Marc Delcroix and Yuma Koizumi and Hiroaki Ito and Keisuke Kinoshita and Shoko Araki},
  title={{Listen to What You Want: Neural Network-Based Universal Sound Selector}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1441--1445},
  doi={10.21437/Interspeech.2020-2210},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2210}
}