Articulatory Copy Synthesis Based on a Genetic Algorithm

Yingming Gao, Simon Stone, Peter Birkholz

This paper describes a novel approach for copy synthesis of human speech with the articulatory speech synthesizer VocalTractLab (VTL). For a given natural utterance, an appropriate gestural score (an organized pattern of articulatory movements) was obtained in two steps: initialization and optimization. In the first step, we employed a rule-based method to create an initial gestural score. In the second step, this initial gestural score was optimized by a genetic algorithm such that the cosine distance of acoustic features between the synthetic and natural utterances was minimized. The optimization was regularized by limiting certain gestural score parameters to reasonable values during the analysis-by-synthesis procedure. The experiment results showed that, compared to a baseline coordinate descent algorithm, the genetic algorithm performed better in terms of acoustic distance. In addition, a perceptual experiment was conducted to rate the similarity between the optimized synthetic speech and the original human speech. Here, similarity scores of optimized utterances with regularization were significantly higher than those without regularization.

 DOI: 10.21437/Interspeech.2019-1334

Cite as: Gao, Y., Stone, S., Birkholz, P. (2019) Articulatory Copy Synthesis Based on a Genetic Algorithm. Proc. Interspeech 2019, 3770-3774, DOI: 10.21437/Interspeech.2019-1334.

  author={Yingming Gao and Simon Stone and Peter Birkholz},
  title={{Articulatory Copy Synthesis Based on a Genetic Algorithm}},
  booktitle={Proc. Interspeech 2019},