ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
This paper describes a novel approach for weighting the contribution of the acoustic and visual sources of information in a bimodal connected speech recognition system. We consider that a different acousticlabial weight is attached to each recognition unit. The values of the weighting vector are optimised in order to minimise error rate on a learning set. Experiments are performed on a two-speakers audio-visual database, composed of connected letters, with two different acoustic-labial speech recognition systems. For both speakers and both systems, the weights optimisation allows us to increase the recognition rate of our bimodal system.
Bibliographic reference. Jourlin, Pierre (1997): "Word-dependent acoustic-labial weights in HMM-based speech recognition", In AVSP-1997, 69-72.