12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis

Zhen-Hua Ling (1), Korin Richmond (2), Junichi Yamagishi (2)

(1) USTC, China
(2) University of Edinburgh, UK

In previous work, we have proposed a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into hidden Markov model (HMM) based parametric speech synthesis. A unified acoustic-articulatory model was trained and a piecewise linear transform was adopted to describe the dependency between these two feature streams. The transform matrices were trained for each HMM state and were tied based on each state's context. In this paper, an improved acoustic-articulatory modelling method is proposed. A Gaussian mixture model (GMM) is introduced to model the articulatory space and the cross-stream transform matrices are trained for each Gaussian mixture instead of context-dependently. This means the dependency relationship can vary with the change of articulatory features flexibly. Our results show this method improves the effectiveness of control over vowel quality by modifying articulatory trajectories without degrading naturalness.

Full Paper

Bibliographic reference.  Ling, Zhen-Hua / Richmond, Korin / Yamagishi, Junichi (2011): "Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis", In INTERSPEECH-2011, 117-120.