International Conference on Auditory-Visual Speech Processing 2008
Tangalooma Wild Dolphin Resort,
Moreton Island, Queensland, Australia
In this paper four different approaches of parameterization (dimension reduction) applied to facial speech movements of 3D motion captured speech data were compared. The threedimensional coordinates of 27 markers glued on the face of a speaker articulating a series of German VCV sequences were extracted and parameterized. As methods of dimension reduction a principal component analysis (PCA), and a guided PCA were used. Additionally, we used a newly proposed method called guided non-linear Model Estimation (gnoME). For a better comparison between the methods (especially for comparison with the plain PCA) we also used a non-guided version of gnoME (in the following called noME). In order to evaluate the different approaches of parameterization the speech data was reconstructed by each method using the first five components, respectively. Furthermore, the original motion capture data was re-synthesized. The reconstructed and re-synthesized visual speech stimuli, the original video and the audio signal without video presentation were compared in a perception experiment with respect to intelligibility. As objective measures the explained variance and the mean marker deviation (mean Euclidian error) were used. Results of the objective measures show that the nonlinear Model Estimation represents the data more accurate, while the intelligibility scores from the parameterizations equal those of the original data.
Bibliographic reference. Madany, Katja / Fagel, Sascha (2008): "Objective and perceptual evaluation of parameterizations of 3d motion captured speech data", In AVSP-2008, 195-198.