Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Simultaneous Estimation of Vocal Tract and Voice Source Parameters With Application to Speech Synthesis

Wen Ding, Hideki Kasuya, Shuichi Adachi

Faculty of Engineering, Utsunomiya University, Utsunomiya, Japan

In order to synthesize natural sounding speech with voice quality variations, we propose a concatenative synthesis method based on stored formant/antiformant templates of vowel-consonant-vowel (VCV) segments and on sophisticated control of voice source parameters. By using the parametric Rosenberg-Klatt (RK) model to generate a voiced source waveform and an autoregressive exogenous (ARX) model to represent voiced speech production process, a new adaptive pitch-synchronous analysis method has been devised to estimate the model parameters from which the templates are semiautomatically created. The Kalman filter algorithm deals with the ARX model identification and a simulated annealing method is used for the nonlinear optimization to estimate the voice source parameters. The method has been tested with synthetic speech sounds by comparing with some other approaches in terms of the accuracy of estimated parameter values. Preliminary synthesis experiments have shown that natural sounding speech with various voice qualities can be generated with the proposed method by manipulating the voice source parameters.

