FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and
Auditory-Visual Speech Processing

Vienna, Austria
September 11-13, 2015

HMM-Based Visual Speech Synthesis Using Dynamic Visemes

Ausdang Thangthai, Barry-John Theobald

School of Computing Sciences, University of East Anglia, Norwich, UK

In this paper we incorporate dynamic visemes into hidden Markov model (HMM)-based visual speech synthesis. Dynamic visemes represent intuitive visual gestures identified automatically by clustering purely visual speech parameters. They have the advantage of spanning multiple phones and so they capture the effects of visual coarticulation explicitly within the unit. The previous application of dynamic visemes to synthesis used a sample-based approach, where cluster centroids were concatenated to form parameter trajectories corresponding to novel visual speech. In this paper we generalize the use of these units to create more flexible and dynamic animation using a HMM-based synthesis framework. We show using objective and subjective testing that aHMMsynthesizer trained using dynamic visemes can generate better visual speech than HMM synthesizers trained using either phone or traditional viseme units. Index Terms: visual speech synthesis, hidden Markov model, dynamic visemes

Full Paper

Bibliographic reference.  Thangthai, Ausdang / Theobald, Barry-John (2015): "HMM-based visual speech synthesis using dynamic visemes", In FAAVSP-2015, 88-92.