Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder

Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

This paper investigates the use of time-domain convolutional denoising autoencoders (TCDAEs) with multiple channels as a method of speech enhancement. In general, denoising autoencoders (DAEs), deep learning systems that map noise-corrupted into clean waveforms, have been shown to generate high-quality signals while working in the time domain without the intermediate stage of phase modeling. Convolutional DAEs are one of the popular structures which learns a mapping between noise-corrupted and clean waveforms with convolutional denoising autoencoder. Multi-channel signals for TCDAEs are promising because the different times of arrival of a signal can be directly processed with their convolutional structure, Up to this time, TCDAEs have only been applied to single-channel signals. This paper explorers the effectiveness of TCDAEs in a multi-channel configuration. A multi-channel TCDAEs are evaluated on multi-channel speech enhancement experiments, yielding significant improvement over single-channel DAEs in terms of signal-to-distortion ratio, perceptual evaluation of speech quality (PESQ), and word error rate.

 DOI: 10.21437/Interspeech.2019-3197

Cite as: Tawara, N., Kobayashi, T., Ogawa, T. (2019) Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder. Proc. Interspeech 2019, 86-90, DOI: 10.21437/Interspeech.2019-3197.

  author={Naohiro Tawara and Tetsunori Kobayashi and Tetsuji Ogawa},
  title={{Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder}},
  booktitle={Proc. Interspeech 2019},