ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
This paper deals with a noisy speech enhancement technique based on the fusion of auditory and visual information. We first relate this approach to experimental data suggesting the existence of an "audiovisual scene analysis module". Then we present the implementation in die context of vowel-consonant-vowel transitions corrupted with white noise (four vowels and six plosives). A first evaluation of the system in this context is presented, including informal listening tests, distance measures and gaussian classification scores. The results shows that a good enhancement of the vocalic parts of the signals is obtained while the consonantal parts are not yet improved by the procedure. We present a pist to deal with this problem.
Bibliographic reference. Girin, L. / Schwartz, Jean-Luc / Feng, G. (1997): "Can the visual input make the audio signal "pop out" in noise ? a first study of the enhancement of noisy VCV acoustic sequences by audio-visual fusion", In AVSP-1997, 37-40.