Generative Adversarial Network Based Acoustic Echo Cancellation

Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li

Generative adversarial networks (GANs) have become a popular research topic in speech enhancement like noise suppression. By training the noise suppression algorithm in an adversarial scenario, GAN based solutions often yield good performance. In this paper, a convolutional recurrent GAN architecture (CRGAN-EC) is proposed to address both linear and nonlinear echo scenarios. The proposed architecture is trained in frequency domain and predicts the time-frequency (TF) mask for the target speech. Several metric loss functions are deployed and their influence on echo cancellation performance is studied. Experimental results suggest that the proposed method outperforms the existing methods for unseen speakers in terms of echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ). Moreover, multiple metric loss functions provide more freedom to achieve specific goals, e.g., more echo suppression or less distortion.

 DOI: 10.21437/Interspeech.2020-1454

Cite as: Zhang, Y., Deng, C., Ma, S., Sha, Y., Song, H., Li, X. (2020) Generative Adversarial Network Based Acoustic Echo Cancellation. Proc. Interspeech 2020, 3945-3949, DOI: 10.21437/Interspeech.2020-1454.

  author={Yi Zhang and Chengyun Deng and Shiqian Ma and Yongtao Sha and Hui Song and Xiangang Li},
  title={{Generative Adversarial Network Based Acoustic Echo Cancellation}},
  booktitle={Proc. Interspeech 2020},