Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Emotional Voice Conversion for Mandarin using Tone Nucleus Model – Small Corpus and High Efficiency

Miaomiao Wang (1), Miaomiao Wen (2), Keikichi Hirose (3), Nobuaki Minematsu (3)

(1) Toshiba (China) R&D Center
(2) Language Technologies Institute, Carnegie Mellon University, USA
(3) Graduate School of Information Science and Technology, University of Tokyo, Japan

The GMM-based spectral conversion techniques were applied to emotion conversion but it was found that spectral transformation alone is not sufficient for conveying the required target emotion. In this paper, we adopt the tone nucleus model to carry the most important information of tones and represent F0 contour for Mandarin speech. And then tone nucleus part is converted to emotional speech from neutral ones. The tone nuclei variations are modeled by the classification and regression tree (CART) and dynamic programming. Compared with previous prosody transforming methods, the proposed method 1) uses only the tone nucleus part of each syllable rather than the whole F0 contour to avoid the data sparseness problems in emotion conversion; 2) builds mapping functions for well-chosen tone nucleus model parameters to better capture Mandarin tonal and emotional information. Using only a modest amount of training data, the perceptual accuracy achieved by our method was shown to be comparable to that obtained by a professional speaker.

Index Terms: Emotional voice conversion, Mandarin, Tone nucleus

Bibliographic reference.  Wang, Miaomiao / Wen, Miaomiao / Hirose, Keikichi / Minematsu, Nobuaki (2012): "Emotional voice conversion for Mandarin using tone nucleus model – small corpus and high efficiency", In SP-2012, 163-166.