Speech Prosody 2010
Chicago, IL, USA
A method was developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the bunsetsu of an utterance. It controls F0 based on the F0 model; not frame-byframe F0 prediction as in the case of HMM-based speech synthesis. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.
Index Terms: Generation process model, F0 contour, Corpusbased method, Speech synthesis, Prosodic focus
Bibliographic reference. Ochi, Keiko / Hirose, Keikichi / Minematsu, Nobuaki (2010): "Realization of prosodic focuses in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model", In SP-2010, paper 880.