EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Corpus-Based Synthesis of Fundamental Frequency Contours Based on A Generation Process Model

Keikichi Hirose (1), Masaya Eto (1), Nobuaki Minematsu (1), Atsuhiro Sakurai (2)

(1) University of Tokyo, Japan
(2) Tsukuba R&D Center, Texas Instruments Japan, Japan

A mode-constrained corpus-based synthesis strategy was developed for F0 contours of Japanese sentences. In the training phase, the relationship between linguistic factors and the command values of F0 contour generation process model was learned using neural networks. Input parameters consist of linguistic information related to accentual phrases that can be automatically driven from text, such as the number of morae, and so on. In the synthesis phase, the network is used to generate the command values. The synthesis method was also realized based on multiple linear regression analysis to examine how each input parameter contributes to the F0 contour generation. The use of the parametric model restricts the degrees of freedom of the mapping between linguistic and prosodic features, and thus enables to generate appropriate values even with limited training data. Experimental results showed that the method could generate F0 contours quite close to those by the rule-based method.

Full Paper

Bibliographic reference.  Hirose, Keikichi / Eto, Masaya / Minematsu, Nobuaki / Sakurai, Atsuhiro (2001): "Corpus-based synthesis of fundamental frequency contours based on a generation process model", In EUROSPEECH-2001, 2255-2258.