An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection

Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin, Lin Liu


Mean teacher based methods are increasingly achieving state-of-the-art performance for large-scale weakly labeled and unlabeled sound event detection (SED) tasks in recent DCASE challenges. By penalizing inconsistent predictions under different perturbations, mean teacher methods can exploit large-scale unlabeled data in a self-ensembling manner. In this paper, an effective perturbation based semi-supervised learning (SSL) method is proposed based on the mean teacher method. Specifically, a new independent component (IC) module is proposed to introduce perturbations for different convolutional layers, designed as a combination of batch normalization and dropblock operations. The proposed IC module can reduce correlation between neurons to improve performance. A global statistics pooling based attention module is further proposed to explicitly model inter-dependencies between the time-frequency domain and channels, using statistics information (e.g. mean, standard deviation, max) along different dimensions. This can provide an effective attention mechanism to adaptively re-calibrate the output feature map. Experimental results on Task 4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving about 39.8% F1-score, outperforming the previous winning system’s 32.4% by a significant margin.


 DOI: 10.21437/Interspeech.2020-2329

Cite as: Zheng, X., Song, Y., Yan, J., Dai, L., McLoughlin, I., Liu, L. (2020) An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection. Proc. Interspeech 2020, 841-845, DOI: 10.21437/Interspeech.2020-2329.


@inproceedings{Zheng2020,
  author={Xu Zheng and Yan Song and Jie Yan and Li-Rong Dai and Ian McLoughlin and Lin Liu},
  title={{An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={841--845},
  doi={10.21437/Interspeech.2020-2329},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2329}
}