Adaptation and Frontend Features to Improve Naturalness in Found-Data Synthesis

Erica Cooper, Julia Hirschberg


We compare two approaches for training statistical parametric voices that make use of acoustic and prosodic features at the utterance level with the aim of improving naturalness of the resultant voices -- subset adaptation, and adding new acoustic and prosodic features at the frontend. We have found that the approach of labeling high, middle, or low values for a given feature at the frontend and then choosing which setting to use at synthesis time can produce voices rated as significantly more natural than a baseline voice that uses only the standard contextual frontend features, for both HMM-based and neural network-based synthesis.


 DOI: 10.21437/SpeechProsody.2018-160

Cite as: Cooper, E., Hirschberg, J. (2018) Adaptation and Frontend Features to Improve Naturalness in Found-Data Synthesis. Proc. 9th International Conference on Speech Prosody 2018, 794-798, DOI: 10.21437/SpeechProsody.2018-160.


@inproceedings{Cooper2018,
  author={Erica Cooper and Julia Hirschberg},
  title={Adaptation and Frontend Features to Improve Naturalness in Found-Data Synthesis},
  year=2018,
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},
  pages={794--798},
  doi={10.21437/SpeechProsody.2018-160},
  url={http://dx.doi.org/10.21437/SpeechProsody.2018-160}
}