EUROSPEECH 2001 Scandinavia
This paper describes the incorporation of a visual lip tracking and lipreading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made after processing. This work describes the extraction of affine-invariant Fourier descriptors (AI-FDs) from parametric lip contour data. Finally, this work validates the use of optimal weight selection, which is based on the noise type and signal-to-noise ratio (SNR) for joint audio-visual automatic speech recognition (JAV-ASR).
Bibliographic reference. Gurbuz, Sabri / Patterson, Eric K. / Tufekci, Zekeriya / Gowdy, John N. (2001): "Lip-reading from parametric lip contours for audio- visual speech recognition", In EUROSPEECH-2001, 1181-1184.