Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

Predicting Underlying Pitch Targets for Intonation Modeling

Xuejing Sun

Department of Communications Sciences and Disorders, Northwestern University, Evanston, IL, USA

The present paper reports our preliminary attempt on modeling intonation using underlying pitch targets. The underlying pitch targets were derived using a nonlinear regression technique under the pitch target approximation model. We assume that the use of underlying pitch targets can capture the most important intonation patterns while maintaining critical predictive power. Another important aspect of our approach is that we do not rely on pitch accent as a component in the system. To predict the parameters of the underlying targets, we used a recurrent neural network combined with a time-delay window. Comparing the predicted and original pitch targets, the root mean square error (RMSE) is 7.96 Hz, and the correlation coefficient (r) is 6.78. The results are encouraging and suggesting that the use of underlying pitch targets is a promising approach to intonation modeling.

Full Paper

Bibliographic reference.  Sun, Xuejing (2001): "Predicting underlying pitch targets for intonation modeling", In SSW4-2001, paper 126.