Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition

Zhihao Du, Jiqing Han, Xueliang Zhang


To improve the noise robustness of automatic speech recognition (ASR), the generative adversarial network (GAN) based enhancement methods are employed as the front-end processing, which comprise a single adversarial process of an enhancement model and a discriminator. In this single adversarial process, the discriminator is encouraged to find differences between the enhanced and clean speeches, but the distribution of clean speeches is ignored. In this paper, we propose a double adversarial network (DAN) by adding another adversarial generation process (AGP), which forces the discriminator not only to find the differences but also to model the distribution. Furthermore, a functional mean square error (f-MSE) is proposed to utilize the representations learned by the discriminator. Experimental results reveal that AGP and f-MSE are crucial for the enhancement performance on ASR task, which are missed in previous GAN-based methods. Specifically, our DAN achieves 13.00% relative word error rate improvements over the noisy speeches on the test set of CHiME-2, which outperforms several recent GAN-based enhancement methods significantly.


 DOI: 10.21437/Interspeech.2020-1504

Cite as: Du, Z., Han, J., Zhang, X. (2020) Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition. Proc. Interspeech 2020, 309-313, DOI: 10.21437/Interspeech.2020-1504.


@inproceedings{Du2020,
  author={Zhihao Du and Jiqing Han and Xueliang Zhang},
  title={{Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={309--313},
  doi={10.21437/Interspeech.2020-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1504}
}