13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data

Fanbo Meng (1), Zhiyong Wu (2,3), Helen Meng (2,3), Jia Jia (1,3), Lianhong Cai (1,3)

1Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University, Beijing, China
(2) Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, China
(3) Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems Graduate School at Shenzhen, Tsinghua University, Shenzhen, China

Emphasis is an important form of expressiveness in speech. Hidden Markov model (HMM) based synthesis has shown great flexibility in generating expressive speech. This paper proposes a hierarchical model based on HMM aiming at synthesizing emphatic speech of both high emphasis quality and high naturalness with limited data. The decision tree (DT) is constructed with non-emphasis-questions using both neutral and emphasis corpora. We classify the data in each leaf of the DT into 6 emphasis categories according to the emphasis-related questions. The data of the same emphasis category are grouped into one sub-node and are used to train one HMM. As there might be no data of some specific emphasis categories in the leaves of the DT, a method based on the cost calculation is proposed to select a suitable HMM trained from the data of other sub-node in the same leaf for predicting parameters. Further a compensation model is proposed to adjust the predicted parameters. Experiments show that the proposed hierarchical model can synthesize emphatic speech with high quality for both naturalness and emphasis, using limited amount of training data.

Index Terms: emphatic speech synthesis, hidden Markov model (HMM), hierarchy, compensation model

Full Paper

Bibliographic reference.  Meng, Fanbo / Wu, Zhiyong / Meng, Helen / Jia, Jia / Cai, Lianhong (2012): "Hierarchical English emphatic speech synthesis based on HMM with limited training data", In INTERSPEECH-2012, 466-469.