12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Perceptual Quality Dimensions of Text-to-Speech Systems

Florian Hinterleitner (1), Sebastian Möller (1), Christoph Norrenbrock (2), Ulrich Heute (2)

(1) Deutsche Telekom Laboratories, Germany
(2) Christian-Albrechts-Universität zu Kiel, Germany

The aim of this paper is to analyze the perceptual quality dimensions of state-of-the-art text-to-speech systems (TTS). Therefore, several pretests were conducted to determine a suitable set of attribute scales. The resulting 16 scales were used in a semantic differential on a diverse database containing 16 different TTS systems. A subsequent multidimensional analysis (Principal Axis Factor analysis with Promax rotation) resulted in three underlying quality dimensions. They were labeled naturalness, disturbances, and temporal distortions. A mapping of these factors onto the perceived overall quality revealed that naturalness contributes the most to the quality of TTS signals.

Full Paper

Bibliographic reference.  Hinterleitner, Florian / Möller, Sebastian / Norrenbrock, Christoph / Heute, Ulrich (2011): "Perceptual quality dimensions of text-to-speech systems", In INTERSPEECH-2011, 2177-2180.