Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement

Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang


In our previous study, we introduce the neural vocoder into monaural speech enhancement, in which a flow-based generative vocoder is used to synthesize speech waveforms from the Mel power spectra enhanced by a denoising autoencoder. As a result, this vocoder-based enhancement method outperforms several state-of-the-art models on a speaker-dependent dataset. However, we find that there is a big gap between the enhancement performance on the trained and untrained noises. Therefore, in this paper, we propose the self-supervised adversarial multi-task learning (SAMLE) to improve the noise generalization ability. In addition, the speaker dependence is also evaluated for the vocoder-based methods, which is important for real-life applications. Experimental results show that the proposed SAMLE further improves the enhancement performance on both trained and untrained noises, resulting in a better noise generalization ability. Moreover, we find that vocoder-based enhancement methods can be speaker-independent through a large-scale training.


 DOI: 10.21437/Interspeech.2020-1496

Cite as: Du, Z., Lei, M., Han, J., Zhang, S. (2020) Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement. Proc. Interspeech 2020, 3271-3275, DOI: 10.21437/Interspeech.2020-1496.


@inproceedings{Du2020,
  author={Zhihao Du and Ming Lei and Jiqing Han and Shiliang Zhang},
  title={{Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3271--3275},
  doi={10.21437/Interspeech.2020-1496},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1496}
}