Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Realization of Prosodic Focuses in Corpus-based Generation of Fundamental Frequency Contours of Japanese Based on the Generation Process Model

Keiko Ochi, Keikichi Hirose, Nobuaki Minematsu

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, University of Tokyo, Japan

A method was developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the “bunsetsu” of an utterance. It controls F0 based on the F0 model; not frame-byframe F0 prediction as in the case of HMM-based speech synthesis. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.

Index Terms: Generation process model, F0 contour, Corpusbased method, Speech synthesis, Prosodic focus

Full Paper

Bibliographic reference.  Ochi, Keiko / Hirose, Keikichi / Minematsu, Nobuaki (2010): "Realization of prosodic focuses in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model", In SP-2010, paper 880.