A synthesis-by-rule system based on the selective use of non-uniform synthesis units has been developed. This system uses a natural speech database and an algorithm which searches the database for the optimal speech segment to be used as the synthesis unit. Because of flexible use of synthesis units, this scheme has great advantages, especially in expressing many coarticulatory variations. However, in the system, because of its great numbers of units, precise manipulation of speech segments is difficult. In this paper, we will discuss precise manipulation using multiple acoustic-phonetic labels of the unit database, consists of the following five levels: phonemic symbol, acoustic event, allophonic variation, inseparable phenomena and vowel center level. Based on the transcription, speech segments of the database can be utilized for synthesis units by adapting their acoustic-phonetic characteristics.
Bibliographic reference. Takeda, Kazuya / Abe, Katsuo / Sagisaka, Yoshinori / Kuwabara, Hisao (1989): "Adaptive manipulation of non-uniform synthesis units using multi-level unit transcription", In EUROSPEECH-1989, 2195-2198.