Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Whispered Speech Prosody Modeling for TTS Synthesis

Valery A. Petrushin (1), Liliya I. Tsirulnik (2), Veronika Makarova (3)

(1) The Nielsen Company, Schaumburg, IL, USA
(2) Speech Recognition and Synthesis Laboratory, United Institute of Informatics Problems of NAS of Belarus, Minsk, Belarus
(3) Department of Languages and Linguistics, University of Saskatchewan, Saskatoon, Canada

This paper is devoted to modeling prosody of whispered Russian speech. The practical purpose of this research is to extend voice cloning techniques to whispered speech modality. The authors present their analysis of prosodic features that contribute to the expression of sentence type intonation in whispered speech. The current investigation includes intonation contours in complete and incomplete declaratives, as well as in interrogatives and exclamations. Since the fundamental frequency is absent in whisper, the major role in conveying sentence type intonation is taken over by formant values. For modeling prosody of whispered speech, an extension of the Accent Unit Portrait Model is proposed. The paper demonstrates how melodic, rhythmic and dynamic (energy) portraits of accent units can be built and employed for whispered speech modifications by a concatenative text-to-speech synthesizer.

Index Terms: whispered speech, prosody modeling, speech synthesis, accent unit portrait model, formant modification.

Full Paper

Bibliographic reference.  Petrushin, Valery A. / Tsirulnik, Liliya I. / Makarova, Veronika (2010): "Whispered speech prosody modeling for TTS synthesis", In SP-2010, paper 288.