Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
In the past few years a suite of tests has been developed, tested and implemented within the Esprit-SAM project, in order to evaluate the performance of rule synthesizers in many different languages. Since, up to now, no single rule synthesizer has a fully acceptable segmental intelligibility, nor a reasonable prosody, systematic diagnostic evaluation and comparative tests remain necessary. The SAM segmental test, consisting of CV, VC, and VCV nonsense words, following the phonotactic constraints per language, is a proper means for that. Other word types, for instance including consonant clusters, can easily be added. Next there is the SUS test (semantically unpredictable sentences) in which five common grammatical structures are defined, as well as a list of words per word type, allowing one to generate ever-different test material. We also defined an overall quality test by using either a 20-points categorical estimation procedure, or a magnitude estimation procedure, where the subjects adjust the length of a line segment according to their quality judgment. We are developing prosodic tests in which the form and the function of prosodic characteristics in various word and sentence types are evaluated according to their appropriateness. The frequently neglected variability over listeners and between tests is also studied, as well as ways to measure speech quality in an objective way (by using physical means instead of listeners' judgments). Various test procedures have been implemented on a PC in a software package called SOAP.
Bibliographic reference. Pols, Louis C. W. / SAM Partners, SAM Partners (1992): "Multi-lingual synthesis evaluation methods", In ICSLP-1992, 181-184.