On Nonlinear Spatial Filtering in Multichannel Speech Enhancement

Kristina Tesch, Robert Rehr, Timo Gerkmann

Using multiple microphones for speech enhancement allows for exploiting spatial information for improved performance. In most cases, the spatial filter is selected to be a linear function of the input as, for example, the minimum variance distortionless response (MVDR) beamformer. For non-Gaussian distributed noise, however, the minimum mean square error (MMSE) optimal spatial filter may be nonlinear.

Potentially, such nonlinear functional relationships could be learned by deep neural networks. However, the performance would depend on many parameters and the architecture of the neural network. Therefore, in this paper, we more generally analyze the potential benefit of nonlinear spatial filters as a function of the multivariate kurtosis of the noise distribution.

The results imply that using a nonlinear spatial filter is only worth the effort if the noise data follows a distribution with a multivariate kurtosis that is considerably higher than for a Gaussian. In this case, we report a performance difference of up to 2.6 dB segmental signal-to-noise ratio (SNR) improvement for artificial stationary noise. We observe an advantage of 1.2dB for the nonlinear spatial filter over the linear one even for real-world noise data from the CHiME-3 dataset given oracle data for parameter estimation.

 DOI: 10.21437/Interspeech.2019-2751

Cite as: Tesch, K., Rehr, R., Gerkmann, T. (2019) On Nonlinear Spatial Filtering in Multichannel Speech Enhancement. Proc. Interspeech 2019, 91-95, DOI: 10.21437/Interspeech.2019-2751.

  author={Kristina Tesch and Robert Rehr and Timo Gerkmann},
  title={{On Nonlinear Spatial Filtering in Multichannel Speech Enhancement}},
  booktitle={Proc. Interspeech 2019},