Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning

Wei Xue, Ying Tong, Chao Zhang, Guohong Ding, Xiaodong He, Bowen Zhou


The performance of sound event localization and detection (SELD) degrades in source-overlapping cases since features of different sources collapse with each other, and the network tends to fail to learn to separate these features effectively. In this paper, by leveraging the conventional microphone array signal processing to generate comprehensive representations for SELD, we propose a new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning. By using multiple beamformers to extract the signals from different DOAs, the sound field is more diversely described, and specialised representations of target source and noises can be obtained. With labelled training data, the steering vector is estimated based on the cross-power spectra (CPS) and the signal presence probability (SPP), which eliminates the need of knowing the array geometry. We design two networks for sound event localization (SED) and sound source localization (SSL) and use a multi-task learning scheme for SED, in which the SSL-related task act as a regularization. Experimental results using the database of DCASE2019 SELD task show that the proposed method achieves the state-of-art performance.


 DOI: 10.21437/Interspeech.2020-2759

Cite as: Xue, W., Tong, Y., Zhang, C., Ding, G., He, X., Zhou, B. (2020) Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning. Proc. Interspeech 2020, 5091-5095, DOI: 10.21437/Interspeech.2020-2759.


@inproceedings{Xue2020,
  author={Wei Xue and Ying Tong and Chao Zhang and Guohong Ding and Xiaodong He and Bowen Zhou},
  title={{Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={5091--5095},
  doi={10.21437/Interspeech.2020-2759},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2759}
}