NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement

Feng Deng, Tao Jiang, Xiao-Rui Wang, Chen Zhang, Yan Li

For single channel speech enhancement, contextual information is very important for accurate speech estimation. In this paper, to capture long-term temporal contexts, we treat speech enhancement as a sequence-to-sequence mapping problem, and propose a noise-aware attention-gated network (NAAGN) for speech enhancement. Firstly, by incorporating deep residual learning and dilated convolutions into U-Net architecture, we present a deep residual U-net (ResUNet), which significantly expand receptive fields to aggregate context information systematically. Secondly, the attention-gated (AG) network is integrated into the ResUNet architecture with minimal computational overhead while furtherly increasing the long-term contexts sensitivity and prediction accuracy. Thirdly, we propose a novel noise-aware multi-task loss function, named weighted mean absolute error (WMAE) loss, in which both speech estimation loss and noise prediction loss are taken into consideration. Finally, the proposed NAAGN model was evaluated on the Voice Bank corpus and DEMAND database, which have been widely applied for speech enhancement by lots of deep learning models. Experimental results indicate that the proposed NAAGN method can achieve a larger segmental SNR improvement, a better speech quality and a higher speech intelligibility than reference methods.

 DOI: 10.21437/Interspeech.2020-1133

Cite as: Deng, F., Jiang, T., Wang, X., Zhang, C., Li, Y. (2020) NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement. Proc. Interspeech 2020, 2457-2461, DOI: 10.21437/Interspeech.2020-1133.

  author={Feng Deng and Tao Jiang and Xiao-Rui Wang and Chen Zhang and Yan Li},
  title={{NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement}},
  booktitle={Proc. Interspeech 2020},