Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Learning Optimal Audiovisual Phasing for an HMM-based Control Model for Facial Animation

Oxana Govokhina (1,2), Gérard Bailly (1), Gaspard Breton (2)

(1) GIPSA-Lab Dept. Speech & Cognition, CNRS/INPG/UJF & Univ. Stendhal, Grenoble, France
(2) France Telecom R&D, Cesson-Sévigné, France

We propose here an HMM-based trajectory formation system that predicts articulatory trajectories of a talking face from phonetic input. In order to add flexibility to the acoustic/gestural alignment and take into account anticipatory gestures, a phasing model has been developed that predicts the delays between the acoustic boundaries of allophones to be synthesized and the gestural boundaries of HMM triphones. The HMM triphones and the phasing model are trained simultaneously using an iterative analysis-synthesis loop. Convergence is obtained within a few iterations. We demonstrate here that the phasing model improves significantly the prediction error and captures subtle context-dependent anticipatory phenomena.

Full Paper   Presentation (ppt) with embedded audiovisual material

Bibliographic reference.  Govokhina, Oxana / Bailly, Gérard / Breton, Gaspard (2007): "Learning optimal audiovisual phasing for an HMM-based control model for facial animation", In SSW6-2007, 1-4.