4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Audiovisual Speech Recognition using Multiscale Nonlinear Image Decomposition.

I. A. Matthews, J. Bangham, S. J. Cox

School of Information Systems, University of East Anglia, Norwich, UK

There has recently been increasing interest in the idea of enhancing speech recognition by the use of visual information derived from the face of the talker. This paper demonstrates the use of nonlinear image decomposition, in the form of a ‘sieve’, applied to the task of visual speech recognition. Information derived from the mouth region is used in visual and audiovisual speech recognition of a database of the letters A-Z for four talkers. A scale histogram is generated directly from the grayscale pixels of a window containing the talkers mouth on a per frame basis. Results are presented for visual-only, audio-only and in a simple audiovisual case.

Full Paper

Bibliographic reference.  Matthews, I. A. / Bangham, J. / Cox, S. J. (1996): "Audiovisual speech recognition using multiscale nonlinear image decomposition.", In ICSLP-1996, 38-41.