A Convolutional Neural Network with Non-Local Module for Speech Enhancement

Xiaoqi Li, Yaxing Li, Meng Li, Shan Xu, Yuanjie Dong, Xinrong Sun, Shengwu Xiong

Convolution neural networks (CNNs) are achieving increasing attention for the speech enhancement task recently. However, the convolutional operations only process a local neighborhood (several nearest neighboring neurons) at a time across either space or time direction. The long-range dependencies can only be captured when the convolutional operations are applied recursively, but the problems of computationally inefficient and optimization difficulties are introduced. Inspired by the recent impressive performance of the non-local module in many computer vision tasks, we propose a convolutional neural network with non-local module for speech enhancement in this paper. The non-local operations are capable of capturing the global information in the frequency domain through passing information between distant time-frequency units. The non-local operations are able to set the dimension of the input as an arbitrary value, which results in the easy integration with our proposed network framework. Experimental results demonstrate that the proposed method not only improves the computational efficiency significantly but also outperforms the competing methods in terms of objective speech intelligibility and quality metrics.

 DOI: 10.21437/Interspeech.2019-2472

Cite as: Li, X., Li, Y., Li, M., Xu, S., Dong, Y., Sun, X., Xiong, S. (2019) A Convolutional Neural Network with Non-Local Module for Speech Enhancement. Proc. Interspeech 2019, 1796-1800, DOI: 10.21437/Interspeech.2019-2472.

  author={Xiaoqi Li and Yaxing Li and Meng Li and Shan Xu and Yuanjie Dong and Xinrong Sun and Shengwu Xiong},
  title={{A Convolutional Neural Network with Non-Local Module for Speech Enhancement}},
  booktitle={Proc. Interspeech 2019},