First European Conference on Speech Communication and Technology

Paris, France
September 27-29, 1989

Direct Scaling of the Performance of Text-to-Speech Synthesis Systems

Chaslav V. Pavlovic, Mario Rossi, Robert Espesser

LA 261, CNRS, Parole et Language, Institut de Phonetique, Université de Provence, Aix en Provence, France

As text-to-speech systems develop it becomes necessary to compare various solutions and to evaluate whether a change in the synthesis procedure has an effect on the listener's attitude to the system. Because there are no physical measurements that result in indices that quantify perceptual attributes of synthesized speech, psychophysical tests need to be used. The present study assesses the effectiveness of measuring listeners' impressions of synthesized speech using a magnitude estimation task. In particular, this study focuses on acceptability (i.e. the overall users' satisfaction with the communication situation), intelligibility (i.e. how identifiable is the linguistic message), and naturalness (i.g. how much the system sounds like a normal human talker). The study consists of three experiments which are described further in the text. Depending on the experiment, four or seven synthesizers were evaluated in one or two types of external distortion (noise), as well as in quiet.

