Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Audio-Visual Speech Recognition Compared Across Two Architectures

A. Adjoudani, Christian Benoît

Institut de la Communication Parlée, Unite de Recherche Associee au CNRS N° 368 INPG/ENSERG - Université Stendhal, Grenoble, France

In this paper, we describe two architectures for combining automatic lip-reading and acoustic speech recognition. We propose a model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation. This is achieved by using a hybrid system based on two HMMs trained respectively with auditory and visual data. Both architectures have been tested on degraded audio over a wide range of S/N ratios. The results of these experiments are presented and discussed.

Full Paper

Bibliographic reference.  Adjoudani, A. / Benoît, Christian (1995): "Audio-visual speech recognition compared across two architectures", In EUROSPEECH-1995, 1563-1566.