The aim of this paper is to analyze the perceptual quality dimensions of state-of-the-art text-to-speech systems (TTS). Therefore, several pretests were conducted to determine a suitable set of attribute scales. The resulting 16 scales were used in a semantic differential on a diverse database containing 16 different TTS systems. A subsequent multidimensional analysis (Principal Axis Factor analysis with Promax rotation) resulted in three underlying quality dimensions. They were labeled naturalness, disturbances, and temporal distortions. A mapping of these factors onto the perceived overall quality revealed that naturalness contributes the most to the quality of TTS signals.
Bibliographic reference. Hinterleitner, Florian / Möller, Sebastian / Norrenbrock, Christoph / Heute, Ulrich (2011): "Perceptual quality dimensions of text-to-speech systems", In INTERSPEECH-2011, 2177-2180.