Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention

Yan Zhao, DeLiang Wang


Background noise and room reverberation are two major distortions to the speech signal in real-world environments. Each of them degrades speech intelligibility and quality, and their combined effects are especially detrimental. In this paper, we propose a DenseUNet based model for noisy-reverberant speech enhancement, where a novel time-frequency (T-F) attention mechanism is introduced to aggregate contextual information among different T-F units efficiently and a channelwise attention is developed to merge sources of information among different feature maps. In addition, we introduce a normalization-activation strategy to alleviate the performance drop for small batch training. Systematic evaluations demonstrate that the proposed algorithm substantially improves objective speech intelligibility and quality in various noisy-reverberant conditions, and outperforms other related methods.


 DOI: 10.21437/Interspeech.2020-2952

Cite as: Zhao, Y., Wang, D. (2020) Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention. Proc. Interspeech 2020, 3261-3265, DOI: 10.21437/Interspeech.2020-2952.


@inproceedings{Zhao2020,
  author={Yan Zhao and DeLiang Wang},
  title={{Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3261--3265},
  doi={10.21437/Interspeech.2020-2952},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2952}
}