Speech Prosody 2004

Nara, Japan
March 23-26, 2004

Automatic Analysis and Synthesis of Fujisaki's Intonation Model for TTS

Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte

Department of Signal Theory and Communications, TALP Research Center, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain

This paper deals with the automatic analysis and synthesis of intonation using Fujisaki's model. We propose an analysis method which imposes strong linguistic constraints. This method produces good representations of the F0 contour when compared to other current methods which do not impose such constrains. Furthermore, this option limits the variability and is more predictable so it is the best option for prediction (at least when accent commands are related to accent groups). Several prediction algorithms are evaluated. The results show that VCART (an extension of CART to predict vector values) gives the best performance when compared with standard CART or with neural networks. The paper also analyzes which features are more relevant to predict the parameters of Fujisaki's model.

Full Paper

Bibliographic reference.  Agüero, Pablo Daniel / Wimmer, Klaus / Bonafonte, Antonio (2004): "Automatic analysis and synthesis of fujisaki's intonation model for TTS", In SP-2004, 427-430.