ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
Audio Visual Speech Recognition and Segmental Master Slave HMM
Regine André-Obrecht, Bruno Jacob, Nathalie Parlangeau
IRIT-University Paul Sabatier-CNRS UMR 5505, Toulouse, France
Our work deals with the classical problem of merging
heterogenous and asynchronous parameters. It's well known
that lips reading improves the speech recognition score,
specially in noise condition ; so we study more precisely the
modeling of acoustic and labial parameters to propose two
Automatic Speech Recognition Systems:
Our task is the recognition of spelled french
letters, in clear and noisy ( cocktail party ) environments.
Whatever the approach and condition, the introduction of
labial features improves the performances, but the difference
between the two models isn't enough sufficient to provide
- a Direct Identification is performed by using a classical
HMM approach: no correlation between visual and acoustic
parameters is assumed.
- two correlated models : a master HMM and a slave HMM,
process respectively the labial observations and the acoustic
To assess each approach, we use a segmental pre-processing
and an acoustic robust elementary unit "the pseudodiphone".
André-Obrecht, Regine / Jacob, Bruno / Parlangeau, Nathalie (1997):
"Audio visual speech recognition and segmental master slave HMM",
In AVSP-1997, 49-52.