First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes a model of the voice source and of formant trajectories which can be used in developing a high-fidelity speech synthesizer. A polynomial model was used to generate the glottal source. Formant trajectories are modelled as the sum of two kinds of functions: one represents vowel-to-vowel transitions and the other represents the effects of surrounding consonants upon the formants. The intelligibility and fidelity were tested for the speech synthesized based on the model at slow and fast speaking rates. Compared to speech obtained by an analysis-synthesis method, the model slightly improved the intelligibility of vowels at both speaking rates, and of consonants at slow rate. For consonants at fast rate, the model made the intelligibility decrease by 6%. The polynomial model of the glottal source could reproduce to some extent delicate voice quality differences in vowels uttered at various pitch and loudness. It was found that this model is useful as a high-fidelity synthesizer with variable speaking rate.
Bibliographic reference. Imaizumi, Satoshi / Imagawa, Hiroshi / Kiritani, Shigeru (1990): "A model of dynamic characteristics of the voice source and formant trajectories", In ICSLP-1990, 173-176.