Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
The present paper reports our preliminary attempt on modeling intonation using underlying pitch targets. The underlying pitch targets were derived using a nonlinear regression technique under the pitch target approximation model. We assume that the use of underlying pitch targets can capture the most important intonation patterns while maintaining critical predictive power. Another important aspect of our approach is that we do not rely on pitch accent as a component in the system. To predict the parameters of the underlying targets, we used a recurrent neural network combined with a time-delay window. Comparing the predicted and original pitch targets, the root mean square error (RMSE) is 7.96 Hz, and the correlation coefficient (r) is 6.78. The results are encouraging and suggesting that the use of underlying pitch targets is a promising approach to intonation modeling.
Bibliographic reference. Sun, Xuejing (2001): "Predicting underlying pitch targets for intonation modeling", In SSW4-2001, paper 126.