Speech Prosody 2004

Nara, Japan
March 23-26, 2004

Speech Synthesis with Attitude

Yoshinori Sagisaka, Takumi Yamashita, Yoko Kokenawa

Global Information and Telecommunication Institute, Waseda University, Tokyo, Japan

F0 characteristics were analyzed and modeled for the output of speech with natural prosody in communication systems. Lexicons were selected to express speaker's attitude during the human speech generation process. We modeled the prosody using information of constituent lexicons expressing attitude and markedness. Motivated by preliminary observations of prosodic variations in conversational speech, F0 characteristics were quantitatively analyzed using simple phrases consisting of adjectives expressing positive or negative attitude and adverbs expressing different degrees of markedness. Strong positive/negative correlations were observed between the markedness of adverbs and F0 height when an adjective phrase with a positive/negative attitude follows the current adverb. These consistencies have been perceptually confirmed by naturalness evaluation tests. Finally, F0 control is modeled using lexical information expressing positive or negative attitude and markedness.

Full Paper

Bibliographic reference.  Sagisaka, Yoshinori / Yamashita, Takumi / Kokenawa, Yoko (2004): "Speech synthesis with attitude", In SP-2004, 401-404.