A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan


We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions. The NAMAN architecture consists of three parts, a main regression network, a memory block and an attention block. First, a long short-term memory recurrent neural network (LSTM-RNN) is adopted as the main network to well model the acoustic context of neighboring frames. Next, the memory block is built with an extensive set of noise feature vectors as the prior noise bases. Finally, the attention block serves as an auxiliary network to improve the noise awareness of the main network by encoding the dynamic noise information at frame level through additional features obtained by weighing the existing noise basis vectors in the memory block. Our experiments show that the proposed NAMAN framework is compact and outperforms the state-of-the-art dynamic noise-aware training approaches in low SNR conditions.


 DOI: 10.21437/Interspeech.2020-2037

Cite as: Wang, Y., Du, J., Chai, L., Lee, C., Pan, J. (2020) A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement. Proc. Interspeech 2020, 4501-4505, DOI: 10.21437/Interspeech.2020-2037.


@inproceedings{Wang2020,
  author={Yu-Xuan Wang and Jun Du and Li Chai and Chin-Hui Lee and Jia Pan},
  title={{A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4501--4505},
  doi={10.21437/Interspeech.2020-2037},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2037}
}