Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System

Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang, Yu Tsao, Pei-Chun Li, Ying-Hui Lai


The voice conversion (VC) system is a well-known approach to improve the communication efficiency of patients with dysarthria. In this study, we used a gated convolutional neural network (Gated CNN) with the phonetic posteriorgrams (PPGs) features to perform VC for patients with dysarthria, with WaveRNN vocoder used to synthesis converted speech. In addition, two well-known deep learning-based models, convolution neural network (CNN) and bidirectional long short-term memory (BLSTM) were used to compare with the Gated CNN in the proposed VC system. The results from the evaluation of speech intelligibility metric of Google ASR and listening test showed that the proposed system performed better than the original dysarthric speech. Meanwhile, the Gated CNN model performs better than the other models and requires fewer parameters compared to BLSTM. The results suggested that Gated CNN can be used as a communication assistive system to overcome the degradation of speech intelligibility caused by dysarthria.


 DOI: 10.21437/Interspeech.2020-1367

Cite as: Chen, C., Zheng, W., Wang, S., Tsao, Y., Li, P., Lai, Y. (2020) Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System. Proc. Interspeech 2020, 4686-4690, DOI: 10.21437/Interspeech.2020-1367.


@inproceedings{Chen2020,
  author={Chen-Yu Chen and Wei-Zhong Zheng and Syu-Siang Wang and Yu Tsao and Pei-Chun Li and Ying-Hui Lai},
  title={{Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4686--4690},
  doi={10.21437/Interspeech.2020-1367},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1367}
}