Second ESCA/IEEE Workshop on Speech Synthesis
September 12-15, 1994
Analytic measurement of visual parameters relevant to the labial production of speech as well as real-time 3D computer animated models of the lips and of the face have been implemented on two coupled computers, so that synthetic lips alone or a whole facial model can mimic on line (or play back) the actual gestures of a natural speaker. The geometric measurements performed on the speaker's lips and jaw are made through image processing of the front and profile view of the speaker's face. Data are transmitted to a display computer through a control interface which delivers the proper parameters to control the animation of the 3D models. The lip model uses five control parameters and the facial model uses one extra one: jaw lowering. At present, the tongue is not controlled. We here present the real-time techniques used for analysis, animation of the 3D models, and synchronization of the two processes. Finally, we evaluate the bimodal intelligibility of speech under five levels of acoustic degradation by added noise. We compare the intelligibility of the speech signal presented alone, with the lip model, with the facial model, and with the original speaker's face. Our results confirm the importance of visual information in the perception of speech: The whole natural face restores two thirds of the missing auditory intelligibility when the acoustic transmission is degraded or missing; the facial model (tongue movements excluded) restores half of it; and the lip model alone restores a third of it.
Bibliographic reference. Goff, B. Le / Guiard-Marigny, Thierry / Cohen, M. / Benoît, Christian (1994): "Real-time analysis-synthesis and intelligibility of talking faces", In SSW2-1994, 53-56.