In this paper, we describe two architectures for combining automatic lip-reading and acoustic speech recognition. We propose a model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation. This is achieved by using a hybrid system based on two HMMs trained respectively with auditory and visual data. Both architectures have been tested on degraded audio over a wide range of S/N ratios. The results of these experiments are presented and discussed.
Bibliographic reference. Adjoudani, A. / Benoît, Christian (1995): "Audio-visual speech recognition compared across two architectures", In EUROSPEECH-1995, 1563-1566.