Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization

Ashutosh Pandey, DeLiang Wang


It is recently revealed that deep learning based speech enhancement systems do not generalize to untrained corpora in low signal-to-noise ratio (SNR) conditions, mainly due to the channel mismatch between trained and untrained corpora. In this study, we investigate techniques to improve cross-corpus generalization of complex spectrogram enhancement. First, we propose a long short-term memory (LSTM) network for complex spectral mapping. Evaluated on untrained noises and corpora, the proposed network substantially outperforms a state-of-the-art gated convolutional recurrent network (GCRN). Next, we examine the importance of training corpus for cross-corpus generalization. It is found that a training corpus that contains utterances with different channels can significantly improve performance on untrained corpora. Finally, we observe that using a smaller frame shift in short-time Fourier transform (STFT) is a simple but highly effective technique to improve cross-corpus generalization.


 DOI: 10.21437/Interspeech.2020-2561

Cite as: Pandey, A., Wang, D. (2020) Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization. Proc. Interspeech 2020, 4511-4515, DOI: 10.21437/Interspeech.2020-2561.


@inproceedings{Pandey2020,
  author={Ashutosh Pandey and DeLiang Wang},
  title={{Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4511--4515},
  doi={10.21437/Interspeech.2020-2561},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2561}
}