Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model

Keikichi Hirose, Hiroya Hashimoto, Jun Ikeshima, Nobuaki Minematsu

Department of Information and Communication Engineering, the University of Tokyo, Tokyo

Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process model, is improved to allow plural phrase components in a breath group. Since the model can clearly relate its commands with linguistic (and para-/non- linguistic) information, the method further enables flexible controls of prosody through manipulating model commands. Prosodic focus is realized in HMM-based speech synthesis as a supplemental process; viewing the differences of command magnitudes/amplitudes between utterances without and with focus. Validity of the method was confirmed by listening experiments of synthetic speech.

Index Terms: fundamental frequency contour, generation process model, HMM-based speech synthesis, prosodic focus

Full Paper

Bibliographic reference.  Hirose, Keikichi / Hashimoto, Hiroya / Ikeshima, Jun / Minematsu, Nobuaki (2012): "Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model", In SP-2012, 171-174.