Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Auditory/Visual Speech in Multimodal Human Interfaces

Dominic W. Massaro, Michael M. Cohen

Program in Experimental Psychology, University of California, Santa Cruz, CA, USA

It has long been a hope, expectation, and prediction that speech would be the primary medium of communication between humans and machines. To date, this dream has not been realized. We predict that exploiting the multimodal nature of spoken language will facilitate the use of this medium. We begin our paper with a general frame-work for the analysis of speech recognition by humans and a theoretical model. We then present a system for auditory/visual speech synthesis that performs complete text-to-speech synthesis. This system should improve the quality as well as the attractiveness of speech as one of a machine's primary output communication medium. Mirroring the value of multimodal speech synthesis, multimodal channels Should also enhance speech recognition by machine.

Full Paper

Bibliographic reference.  Massaro, Dominic W. / Cohen, Michael M. (1994): "Auditory/visual speech in multimodal human interfaces", In ICSLP-1994, 531-534.