This contribution presents a new approach towards nonintrusive quality assessment of text-to-speech (TTS) signals. Perturbation measures which capture the degree of excitation-specific aperiodicity in voiced speech are investigated concerning their quality implications in synthesized speech. Based on two independent TTS databases for which formal attribute-based listening tests have been conducted, we show that perturbation measures are sensitive to quality aspects of prosody and voice characteristic. Furthermore a dominant dependency on TTS type, namely non-uniform unitselection and diphone synthesis, is identified. Yet, considerable differences between male and female TTS samples are recognized, emphasizing the need for gender-specific quality assessment.
Bibliographic reference. Norrenbrock, Christoph / Heute, Ulrich / Hinterleitner, Florian / Möller, Sebastian (2011): "Aperiodicity analysis for quality estimation of text-to-speech signals", In INTERSPEECH-2011, 2193-2196.