Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion

Tuan Dinh, Alexander Kain, Kris Tjaden


Increasing speech intelligibility for hearing-impaired listeners and normal-hearing listeners in noisy environments remains a challenging problem. Spectral style conversion from habitual to clear speech is a promising approach to address the problem. Motivated by the success of generative adversarial networks (GANs) in various applications of image and speech processing, we explore the potential of conditional GANs (cGANs) to learn the mapping from habitual speech to clear speech. We evaluated the performance of cGANs in three tasks: 1) speaker-dependent one-to-one mappings, 2) speaker-independent many-to-one mappings, and 3) speaker-independent many-to-many mappings. In the first task, cGANs outperformed a traditional deep neural network mapping in terms of average keyword recall accuracy and the number of speakers with improved intelligibility. In the second task, we significantly improved intelligibility of one of three speakers, without any source speaker training data. In the third and most challenging task, we improved keyword recall accuracy for two of three speakers, but without statistical significance.


 DOI: 10.21437/Interspeech.2020-0054

Cite as: Dinh, T., Kain, A., Tjaden, K. (2020) Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion. Proc. Interspeech 2020, 1146-1150, DOI: 10.21437/Interspeech.2020-0054.


@inproceedings{Dinh2020,
  author={Tuan Dinh and Alexander Kain and Kris Tjaden},
  title={{Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1146--1150},
  doi={10.21437/Interspeech.2020-0054},
  url={http://dx.doi.org/10.21437/Interspeech.2020-0054}
}