Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

Synthesis of Emotional Speech Using Prosodically Balanced VCV Segments

Yasuhisa Niimi, Masanori Kasamatsu, Takuya Nishinoto, and Masahiro Araki

Kyoto Institute of Technology, Japan

This paper describes a system to synthesize emotional speech based on TDPSOLA. The system has a database of VCV (vowel consonant vowel) segments for each of three emotions; anger, sadness and joy, These segments have emotional speech quality. The database contains four kinds of VCV segments which are prosodically balanced in the sense that their concatenation can generate any accent patterns of Japanese. The system also has a duration formula for each phoneme and each emotion that can estimate the length of that phoneme given its phonenuc and linguistic context. For these purposes we collected a speech corpus for each emotion. Using the corpus, we derived a guideline for designing the VCV databases and performed a multiple regression analysis to derive duration formulae. Seven utterances were produced for each emotion, which were heard by twelve listeners. The emotions were correctly recognized with an average rate of 84% as the intended emotions.

Full Paper

Bibliographic reference.  Niimi, Yasuhisa / Kasamatsu, Masanori / Nishinoto, Takuya / Araki, Masahiro (2001): "Synthesis of emotional speech using prosodically balanced VCV segments", In SSW4-2001, paper 133.