Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency

Tuan Dinh, Alexander Kain, Robin Samlan, Beiming Cao, Jun Wang


Individuals who undergo a laryngectomy lose their ability to phonate. Yet current treatment options allow alaryngeal speech, they struggle in their daily communication and social life due to the low intelligibility of their speech. In this paper, we presented two conversion methods for increasing intelligibility and naturalness of speech produced by laryngectomees (LAR). The first method used a deep neural network for predicting binary voicing/unvoicing or the degree of aperiodicity. The second method used a conditional generative adversarial network to learn the mapping from LAR speech spectra to clearly-articulated speech spectra. We also created a synthetic fundamental frequency trajectory with an intonation model consisting of phrase and accent curves. For the two conversion methods, we showed that adaptation always increased the performance of pre-trained models, objectively. In subjective testing involving four LAR speakers, we significantly improved the naturalness of two speakers, and we also significantly improved the intelligibility of one speaker.


 DOI: 10.21437/Interspeech.2020-1196

Cite as: Dinh, T., Kain, A., Samlan, R., Cao, B., Wang, J. (2020) Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency. Proc. Interspeech 2020, 4781-4785, DOI: 10.21437/Interspeech.2020-1196.


@inproceedings{Dinh2020,
  author={Tuan Dinh and Alexander Kain and Robin Samlan and Beiming Cao and Jun Wang},
  title={{Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4781--4785},
  doi={10.21437/Interspeech.2020-1196},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1196}
}