EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Predicting Visual Consonant Perception from Physical Measures

Jintao Jiang (1), Abeer Alwan (1), Edward T. Auer (2), Lynne E. Bernstein (2)

(1) University of California at Los Angeles, USA
(2) House Ear Institute, Los Angeles, USA

The long term goal of our work is to predict visual confusion matrices from physical measurements. In this paper, four talkers were chosen to record 69 American-English Consonant-Vowel syllables with audio, video, and facial movements captured. During the recording, 20 markers were put on the face and an optical Qualisys system was used to track three-dimensional facial movements. The videotapes (with markers on the face and without sound) were presented to normal hearing viewers with average or above average lipreading ability, and visual confusion matrices were obtained. Results showed that the facial measurements were correlated with visual perception data by about 0.79 and account for about 63% of the variance.

Full Paper

Bibliographic reference.  Jiang, Jintao / Alwan, Abeer / Auer, Edward T. / Bernstein, Lynne E. (2001): "Predicting visual consonant perception from physical measures", In EUROSPEECH-2001, 179-182.