The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

An HMM-Based Speech Synthesiser using Glottal Post-Filtering

João P. Cabral (1,2), Steve Renals (2), Korin Richmond (2), Junichi Yamagishi (2)

(1) School of Computer Science and Informatics, University College Dublin, Ireland
(2) The Centre for Speech Technology Research, University of Edinburgh,UK

Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker’s identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesisers, in particular, do not typically allow control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the control of voice characteristics in these systems is limited. In contrast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters.

We evaluated the proposed synthesiser in a perceptual experiment, in terms of speech naturalness, intelligibility, and similarity to the original speaker’s voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.

Index Terms: HMM-based speech synthesis, voice quality, glottal post-filter

Full Paper

Bibliographic reference.  Cabral, João P. / Renals, Steve / Richmond, Korin / Yamagishi, Junichi (2010): "An HMM-based speech synthesiser using glottal post-filtering", In SSW7-2010, 365-370.