Auditory-Visual Speech Processing 2005
British Columbia, Canada
Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. Sumby and Pollack (1954) proposed a metric to describe the benefit provided by the face relative to the auditory speech presented alone. We extend this metric to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. The validity of the metric is tested in a new experiment in which auditory speech is presented under 5 different noise levels and is paired with either our synthetic talker Baldi or a natural talker (the standard). A valid metric would allow direct comparisons across different experiments and would give measures of the benefit of a synthetic animated face relative to a natural face and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences, viseme class), different individuals, and applications.
Bibliographic reference. Ouni, Slim / Cohen, Michael M. / Ishak, Hope / Massaro, Dominic W. (2005): "Visual contribution to speech perception: measuring the intelligibility of talking heads", In AVSP-2005, 45-46.