Monaural Speech Enhancement with Dilated Convolutions

Shadi Pirhosseinloo, Jonathan S. Brumberg

In this study, we propose a novel dilated convolutional neural network for enhancing speech in noisy and reverberant environments. The proposed model incorporates dilated convolutions for tracking a target speaker through context aggregations, skip connections, and residual learning for mapping-based monaural speech enhancement. The performance of our model was evaluated in a variety of simulated environments having different reverberation times and quantified using two objective measures. Experimental results show that the proposed model outperforms a long short-term memory (LSTM), a gated residual network (GRN) and convolutional recurrent network (CRN) model in terms of objective speech intelligibility and speech quality in noisy and reverberant environments. Compared to LSTM, CRN and GRN, our method has improved generalization to untrained speakers and noise, and has fewer training parameters resulting in greater computational efficiency.

 DOI: 10.21437/Interspeech.2019-2782

Cite as: Pirhosseinloo, S., Brumberg, J.S. (2019) Monaural Speech Enhancement with Dilated Convolutions. Proc. Interspeech 2019, 3143-3147, DOI: 10.21437/Interspeech.2019-2782.

  author={Shadi Pirhosseinloo and Jonathan S. Brumberg},
  title={{Monaural Speech Enhancement with Dilated Convolutions}},
  booktitle={Proc. Interspeech 2019},