Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Communicative Speech Synthesis with XIMERA: A First Step

Shinsuke Sakai (1,2), Jinfu Ni (1,2), Ranniery Maia (1,2), Keiichi Tokuda (1,3), Minoru Tsuzaki (1,4), Tomoki Toda (1,5), Hisashi Kawai (2,6), Satoshi Nakamura (1,2)

(1) National Inst. of Inform. and Comm. Tech. (NiCT), Japan
(2) ATR Spoken Language Comm. Labs, Japan
(3) Nagoya Institute of Technology, Japan
(4) Kyoto City University of Arts, Japan
(5) Nara Institute of Science and Technology, Japan
(6) KDDI Research and Development Labs, Japan

This paper presents a corpus-based approach to communicative speech synthesis. We chose "good news" style and "bad news" style for our initial attempt to synthesize speech that has appropriate expressiveness desired in human-human or human-machine dialog. We utilized 10-hour "neutral" style speech corpus as well as smaller corpora with good news and bad news styles, each consisting of two to three hours of speech from the same speaker. We trained target HMM models with each style and synthesized speech with unit databases containing speech with the relevant style as well as neutral speech. From the listening tests, we found out that intended communicative styles were comprehended by listeners and that considerably high mean opinion score on naturalness was achieved with rather small, style-specific corpora.

Full Paper   Presentation (ppt)

Bibliographic reference.  Sakai, Shinsuke / Ni, Jinfu / Maia, Ranniery / Tokuda, Keiichi / Tsuzaki, Minoru / Toda, Tomoki / Kawai, Hisashi / Nakamura, Satoshi (2007): "Communicative speech synthesis with XIMERA: a first step", In SSW6-2007, 28-33.