Auditory-Visual Speech Processing
A person-independent representation of the lip-movements is crucial in developing a multimodal speech recognizer. The geometric models used in most lip-tracking techniques can remove some of the features such as skin texture or color, and appropriate normalization of the data and it's projection in the principal components space can reduce the amount of person-specific features even further. Although using Principal Component Analysis (PCA) of the multi-person dataset reveals some interesting features, the inter-person variation is too big to allow for robust speech recognition. There are, however, substantial similarities in the lip-shape variations when analyzing only single-person data sets. We propose to use an adaptive PCA that updates the projection coefficients with respect to the data available for the specific person.
Bibliographic reference. Wojdel, Jacek C. / Rothkrantz, Leon J.M. (2001): "Obtaining person-independent feature space for lip reading", In AVSP-2001, 200 (abstract).