The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

EM-HTS: Real-Time HMM-Based Malay Emotional Speech Synthesis

Mumtaz B. Mustafa, Raja N. Ainon, Roziati Zainuddin

Faculty of Computer Science and Informaion Technology, University of Malaysia

This research aims at developing a real-time HMM-based Malay emotional speech synthesis (EM-HTS) that has the ability to synthesize any form of text input in four different expressions which are neutral, anger, sadness and happiness. The quality of the emotional speech synthesis was improved by using Neutral to Angry, Sad, and Happy (NASH) duration generator; which uses context-dependent duration generation method to improve the duration information to the label files of target emotions for training purposes. We conducted three forms of evaluationb to determine the a ccuracy, intelligibility and naturalness of the speech generated by EM-HTS. All the three test show that the adopted method (NASH) gives a better reproduction of prosody compared to conventionsl method using the same training speech data.

Index Terms: HMM-based emotional speech synthesis, context-dependent duration conversion

Full Paper

Bibliographic reference.  Mustafa, Mumtaz B. / Ainon, Raja N. / Zainuddin, Roziati (2010): "EM-HTS: real-time HMM-based Malay emotional speech synthesis", In SSW7-2010, 340-344.