International Conference on Auditory-Visual Speech Processing 2008
Tangalooma Wild Dolphin Resort,
Moreton Island, Queensland, Australia
A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper, we will show that instead of being treated as separate modelling technique, FHMMs can be adopted as a novel method of training synchronous hidden Markov models (SHMMs). SHMMs are traditionally jointly trained on both the acoustic and visual modalities, and while this technique has worked well for speech-recognition applications, limitations of adaptation algorithms in the commonly used HMM Toolkit software have stymied the use of SHMMs in speaker-recognition scenarios. Our FHMM adaptation method can adapting the multistream models directly from single-stream audio HMMs, allowing both background and speaker dependent SHMMs models to be generated easily. However, experiments conducted on the XM2VTS database show that there does not appear to be any advantage in the frame-level fusion available through the FHMM-adapted SHMMs over the simpler approach of output score fusion of unimodal HMM classifiers.
Bibliographic reference. Dean, David / Sridharan, Sridha (2008): "Fused HMM adaptation of synchronous HMMs for audio-visual speaker verification", In AVSP-2008, 137-141.