13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Japan

This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodic-unit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced regions. The conventional F0 models such as the MSD-HMM and the continuous F0 HMM are not always appropriate for such demand. To overcome this problem, we propose an alternative F0 model named discontinuous observation HMM (DO-HMM) where the unvoiced frames are regarded as missing data. We objectively evaluate the performance of the DO-HMM by comparing it with the conventional F0 modeling techniques and discuss the results.

Index Terms: HMM-based speech synthesis, F0 modeling, prosody generation, discontinuous observation HMM, spontaneous speech.

Full Paper

Bibliographic reference.  Koriyama, Tomoki / Nose, Takashi / Kobayashi, Takao (2012): "Discontinuous observation HMM for prosodic-event-based F0 generation", In INTERSPEECH-2012, 462-465.