INTERSPEECH 2006 - ICSLP
This paper presents a hidden Markov model (HMM) based unit selection method for concatenative speech synthesis system. Frame sized waveform segments are adopted as basic synthesis units here to increase the coverage rate of candidate units and the chance of finding appropriate ones. In training stage, a set of contextual dependent HMMs are trained with static and dynamic acoustic features. When synthesizing a sentence, the optimal frame sequence is searched out from speech corpus by maximizing the output probability of a sentence HMM constructed according to the contextual information of input text. Listening test proves that proposed method can achieve better performance of synthesized speech compared with the method using state sized units and cost function criterion.
Bibliographic reference. Ling, Zhen-Hua / Wang, Ren-Hua (2006): "HMM-based unit selection using frame sized speech segments", In INTERSPEECH-2006, paper 1104-Wed3BuP.3.