12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Perceptual Quality Dimensions of Text-to-Speech Systems

Florian Hinterleitner (1), Sebastian Möller (1), Christoph Norrenbrock (2), Ulrich Heute (2)

(1) Deutsche Telekom Laboratories, Germany
(2) Christian-Albrechts-Universität zu Kiel, Germany

The aim of this paper is to analyze the perceptual quality dimensions of state-of-the-art text-to-speech systems (TTS). Therefore, several pretests were conducted to determine a suitable set of attribute scales. The resulting 16 scales were used in a semantic differential on a diverse database containing 16 different TTS systems. A subsequent multidimensional analysis (Principal Axis Factor analysis with Promax rotation) resulted in three underlying quality dimensions. They were labeled naturalness, disturbances, and temporal distortions. A mapping of these factors onto the perceived overall quality revealed that naturalness contributes the most to the quality of TTS signals.

