Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

HMM-Based Unit Selection Using Frame Sized Speech Segments

Zhen-Hua Ling, Ren-Hua Wang

University of Science & Technology of China, China

This paper presents a hidden Markov model (HMM) based unit selection method for concatenative speech synthesis system. Frame sized waveform segments are adopted as basic synthesis units here to increase the coverage rate of candidate units and the chance of finding appropriate ones. In training stage, a set of contextual dependent HMMs are trained with static and dynamic acoustic features. When synthesizing a sentence, the optimal frame sequence is searched out from speech corpus by maximizing the output probability of a sentence HMM constructed according to the contextual information of input text. Listening test proves that proposed method can achieve better performance of synthesized speech compared with the method using state sized units and cost function criterion.

Full Paper

Bibliographic reference.  Ling, Zhen-Hua / Wang, Ren-Hua (2006): "HMM-based unit selection using frame sized speech segments", In INTERSPEECH-2006, paper 1104-Wed3BuP.3.