Text-to-speech synthesisers were assessed in Italian and German using the paired comparison method. Stimuli were obtained by cross-transfering the prosodic model from one system to the other so that various combinations of prosodic component and acoustic decoder could be compared to one another. It was thus possible to compare the overall quality of the same prosodic model applied to two different acoustic decoders, the overall quality of the same acoustic decoder excited by two different prosodic models, and other combinations. 10 sentences were synthesized by each system. We here present and discuss the results that were obtained in Italian and German with 16 normal hearing subjects who participated in each experiment. First, the proportion of equivalent responses from the two tests show that listeners are much more confident in their judgment when comparing acoustic differences than when comparing prosodic differences. Second, it seems that the differences between two prosodic models that are compared caeteris paribus depend highly on the acoustic decoder used, and that the differences between two acoustic decoders depend somewhat upon the prosodic model used. To conclude, these observations lead us to recommend care when comparing text-to-speech synthesisers that differ in both components, as the effects of the components on listeners' judgments are not independent and cannot be separately assessed.
Bibliographic reference. Benoît, Christian / Emerard, Francoise / Schnabel, Betina / Tseva, A. (1991): "Quality comparisons of prosodic and of acoustic components of various synthesisers", In EUROSPEECH-1991, 875-878.