13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Speaker-adaptive Visual Speech Synthesis in the HMM-framework

Dietmar Schabus (1,2), Michael Pucher (1), Gregor Hofer (1)

(1) FTW Telecommunications Research Center Vienna, Austria
(2) Graz University of Technology, Graz, Austria

In this paper we apply speaker-adaptive and speaker-dependent training of hidden Markov models (HMMs) to visual speech synthesis. In speaker-dependent training we use data from one speaker to train a visual and acoustic HMM. In speaker-adaptive training, first a visual background model (average voice) from multiple speakers is trained. This background model is then adapted to a new target speaker using (a small amount of) data from the target speaker. This concept has been successfully applied to acoustic speech synthesis. This paper demonstrates how model adaption is applied to the visual domain to synthesize animations of talking faces. A perceptive evaluation is performed, showing that speaker-adaptive modeling outperforms speaker-dependent models for small amounts of training / adaptation data.

Index Terms: Visual speech synthesis, speaker-adaptive training, facial animation

Full Paper

Bibliographic reference.  Schabus, Dietmar / Pucher, Michael / Hofer, Gregor (2012): "Speaker-adaptive visual speech synthesis in the HMM-framework", In INTERSPEECH-2012, 979-982.