Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Voice Quality of Synthetic Speech: Representation and Evaluation

Louis C. W. Pols

Institute of Phonetic Sciences / IFOTT, Univ. of Amsterdam, The Netherlands

In most present-day rule synthesis systems (whether allophone-based or using concatenative units), the voice quality is generally limited to one voice and one speaking style. Although by now more knowledge is gathered about how to produce natural-sounding female voices, how to include some emotional elements in synthetic speech, and how to produce more acceptable prosody, a controlled and optimized voice quality of synthetic speech is so far much more a research goal than a reality. This does not preclude of course good use of present-day synthetic speech for specific applications. But even under those conditions, the presently available methods to evaluate speech quality, barely touch on voice quality. Most emphasis is on phonemic quality, which is expressed in phoneme, word, or sentence intelligibility measures, or on very global measures such as overall quality, naturalness, or acceptability scores, generally collected via scale judgments. We present some possibilities for diagnostic and functional evaluation of the voice quality of synthetic speech for specific applications.

Full Paper

Bibliographic reference.  Pols, Louis C. W. (1994): "Voice quality of synthetic speech: representation and evaluation", In ICSLP-1994, 1443-1446.