This paper describes on synthesis units for text-to-speech synthesis. Since in concatenative synthesis spectral control is performed by concatenation of synthesis units, a unit is very important problem. Our basic idea is introducing prosodic features into control of spectral features in order to realize natural sounded synthetic speech. In this paper, we report results of basic analytic experiments. Using classification methods that is under consideration of prosodic feature, we have carried out clustering of CV syllables that were detected from sentence utterances. The results show that there exists an obvious relation between prosodic features and spectral features. We also reports quality of synthesis speech which was evaluated by distortion natural speech and CV syllable units obtained by proposed clustering method. These results strongly suggest that spectral control method considered not only phonetic context but also prosodic feature is able to improve quality of synthetic speech in text-to-speech system.
Bibliographic reference. Ishikawa, Yasushi / Nakajima, Kunio (1995): "Speech synthesis by rule based on synthesis units considering prosodic features", In EUROSPEECH-1995, 1827-1830.