On Synthesis for Supervised Monaural Speech Separation in Time Domain

Jingjing Chen, Qirong Mao, Dong Liu

Time-domain approaches for speech separation have achieved great success recently. However, the sources separated by these time-domain approaches usually contain some artifacts (broadband noises), especially when separating mixture with noise. In this paper, we incorporate synthesis way into the time-domain speech separation approaches to deal with above broadband noises in separated sources, which can be seamlessly used in the speech separation system by a ‘plug-and-play’ way. By directly learning an estimation for each source in encoded domain, synthesis way can reduce artifacts in estimated speeches and improve the speech separation performance. Extensive experiments on different state-of-the-art models reveal that the synthesis way acquires the ability to handle with noisy mixture and is more suitable for noisy speech separation. On a new benchmark noisy dataset, the synthesis way obtains 0.97 dB (10.1%) SDR relative improvement and respective gains on various metrics without extra computation cost.

 DOI: 10.21437/Interspeech.2020-1150

Cite as: Chen, J., Mao, Q., Liu, D. (2020) On Synthesis for Supervised Monaural Speech Separation in Time Domain. Proc. Interspeech 2020, 2627-2631, DOI: 10.21437/Interspeech.2020-1150.

  author={Jingjing Chen and Qirong Mao and Dong Liu},
  title={{On Synthesis for Supervised Monaural Speech Separation in Time Domain}},
  booktitle={Proc. Interspeech 2020},