Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition

Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao


The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech enhancement front end based on deep neural network (DNN). It is suitable for real-time processing and has shown promising automatic speech recognition (ASR) results. However, the ESF only utilizes the knowledge of fixed beamforming, resulting in limited noise reduction capabilities. In this paper, we propose a DNN-based generalized sidelobe canceller (GSC) that can automatically track the target speaker’s direction in real time and use the blocking technique to generate reference noise signals to further reduce noise from the fixed beam pointing to the target direction. The coefficients in the proposed GSC are fully learnable and an ASR criterion is used to optimize the entire network. The 4-channel experiments show that the proposed GSC achieves a relative word error rate improvement of 27.0% compared to the raw observation, 20.6% compared to the oracle direction-based traditional GSC, 10.5% compared to the ESF and 7.9% compared to the oracle mask-based generalized eigenvalue (GEV) beamformer.


 DOI: 10.21437/Interspeech.2020-1101

Cite as: Li, G., Liang, S., Nie, S., Liu, W., Yang, Z., Xiao, L. (2020) Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. Proc. Interspeech 2020, 51-55, DOI: 10.21437/Interspeech.2020-1101.


@inproceedings{Li2020,
  author={Guanjun Li and Shan Liang and Shuai Nie and Wenju Liu and Zhanlei Yang and Longshuai Xiao},
  title={{Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={51--55},
  doi={10.21437/Interspeech.2020-1101},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1101}
}