Auditory-Visual Speech Processing (AVSP) 2009
University of East Anglia, Norwich, UK
We propose a new method for estimation of area of mouth opening from a video sequence of the speaking person. In a paper published in 2000, Grant and Seitz have reported the different degrees of correlation between acoustic envelopes and visible movements. In our method, we exploit these correlations to establish a mathematical model of a Single-Input Multiple-Output (SIMO) system in which the area of mouth opening is the unknown Single Input that we need to estimate. The subband Root Mean Squared (RMS) energies of the speech signal are the observable Multiple Outputs of the model. The unknown input signal can be directly estimated by using the existing blind deconvolution techniques. Our method necessitates only an audio sequence to estimate directly the area of mouth opening in the corresponding video sequence. Consequently, using this method permits us to avoid using complex images processing techniques of the conventional visual features extraction methods, or the training of the estimators in the audioto- visual mapping methods. The audio-visual sequences used for the estimation tests have been recorded by an ordinary webcam. Estimation result is promising; the estimated area of mouth opening is sufficiently correlated with the manually measured one; the average of correlation coefficients obtained by the most effective configuration of the proposed method, on a set of 16 French sentences, is 0.73.
Index Terms: Lip geometric feature, area of mouth opening, speech temporal envelope processing, SIMO, blind deconvolution
Bibliographic reference. Do, Cong-Thanh / Aissa-El-Bey, Abdeldjalil / Pastor, Dominique / Goalic, André (2009): "Area of mouth opening estimation from speech acoustics using blind deconvolution technique", In AVSP-2009, 80-85.