Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Human Processing of Auditory-Visual Information in Speech Perception: Potential for Multimodal Human-Machine Interfaces

Patricia K. Kuhl (1), Minoru Tsuzaki (2), Yoh'ichi Tohkura (2), Andrew N. Meltzoff (3)

(1) Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
(2) Advanced Telecommunication Research Laboratories International, Kyoto, Japan
(3) Department of Psychology, University of Washington, Seattle, WA, USA

Speech perception is not a unimodal process. Observers who see and hear a talker take both auditory and visual information into account in determining what the talker said. This is best illustrated by experiments in which discrepant speech information is delivered to the two modalities. In this situation, observers perceive neither the syllable sent to the auditory modality nor the syllable sent to the visual modality, but a combination of the two. Recent research in our laboratories has established two additional facts. First, the auditory-visual effect occurs in both American and Japanese subjects. Moreover, our studies show that the effect interacts with the language spoken by the talker one watches. Japanese subjects show significantly stronger auditory-visual effects when watching a foreign-language speaker than when watching a native- language speaker. In American subjects, this difference is less pronounced. Second, in investigating auditory-visual effects, we have found that minimal visual information is necessary to produce auditory-visual effects. Data from human observers have implications for human-machine interfaces that utilize multimedia technology.

Full Paper

Bibliographic reference.  Kuhl, Patricia K. / Tsuzaki, Minoru / Tohkura, Yoh'ichi / Meltzoff, Andrew N. (1994): "Human processing of auditory-visual information in speech perception: potential for multimodal human-machine interfaces", In ICSLP-1994, 539-542.