Second ESCA/IEEE Workshop on Speech Synthesis
September 12-15, 1994
The ATR μ-talk non-uniform unit system of concatenative synthesis has been shown to produce very high quality synthetic speech, but is slow and expensive in memory. Furthermore, it was designed for Japanese and is not directly applicable to other languages. This paper shows how the μ-talk principle can be generalised for multi-lingual synthesis, and describes methods for database pruning and faster unit selection that overcome the main criticisms levelled against the Japanese version. To reduce selection time, we substitute prosodic selection criteria for the acoustic measures, and show that these result in faster unit selection that minimises post- processing of the speech waveform and thus reduces distortion in the output speech. To reduce database size, we generate a rectangular array of non-uniform segments to a predetermined depth. This preserves sparse units and maximises tokens of the common sounds of the language.
Bibliographic reference. Campbell, Nick (1994): "Prosody and the selection of units for concatenation synthesis", In SSW2-1994, 61-64.