The quality of every speech synthesis system depends strongly on the quality of the speech inventory. In this paper we discuss the advantages and disadvantages of different speech units that are mostly used in speech synthesis systems. Therefore we have produced three speech inventories for our TTS-system TUBSY, one containing phonemes, another based on phoneme-clusters and a third one with diphones. The resulting speech quality was evaluated by a cluster-identification test. Based on the results of this test we propose a new kind of speech units that combine the advantages of phoneme-clusters and diphones. Informal listening tests show that these elements provide a significant improvement of speech quality over phoneme-clusters and that they achieve a similar speech quality as diphones but with a much smaller inventory.
Bibliographic reference. Jürgens, C. / Wunderlich, M. (1995): "A comparison of different speech units for the German TTS-system tubsy", In EUROSPEECH-1995, 1105-1108.