Concatenative Resynthesis with Improved Training Signals for Speech Enhancement

Ali Raza Syed, Viet Anh Trinh, Michael Mandel

Noise reduction in speech signals remains an important area of research with potential for high impact in speech processing domains such as voice communication and hearing prostheses. We extend and demonstrate significant improvements to our previous work in synthesis-based speech enhancement, which performs concatenative resynthesis of speech signals for the production of noiseless, high quality speech. Concatenative resynthesis methods perform unit selection through learned non-linear similarity functions between short chunks of clean and noisy signals. These mappings are learned using deep neural networks (DNN) trained to predict high similarity for the exact chunk of speech that is contained within a chunk of noisy speech and low similarity for all other pairings. We find here that more robust mappings can be learned with a more efficient use of the available data by selecting pairings that are not exact matches, but contain similar clean speech that matches the original in terms of acoustic, phonetic and prosodic content. The resulting output is evaluated on the small vocabulary CHiME2-GRID corpus and outperforms our original baseline system in terms of intelligibility by combining phonetic similarity with similarity of acoustic intensity, fundamental frequency and periodicity.

 DOI: 10.21437/Interspeech.2018-2439

Cite as: Syed, A.R., Trinh, V.A., Mandel, M. (2018) Concatenative Resynthesis with Improved Training Signals for Speech Enhancement. Proc. Interspeech 2018, 1195-1199, DOI: 10.21437/Interspeech.2018-2439.

  author={Ali Raza Syed and Viet Anh Trinh and Michael Mandel},
  title={Concatenative Resynthesis with Improved Training Signals for Speech Enhancement},
  booktitle={Proc. Interspeech 2018},