13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Articulatory Speaker Normalisation Based on MRI-data Using Three-way Linear Decomposition Methods

Julián Andrés Valdés Vargas (1), Pierre Badin (1), Laurent Lamalle (2)

(1) GIPSA-lab (Département Parole & Cognition), UMR 5216 CNRS, Grenoble University, France
(2) SFR1 RMN Biomédicale et Neurosciences (Unité IRM Recherche 3 Tesla), INSERM, CHU de Grenoble, France

The aim of this study was to characterise, to model and to compare the different lingual articulatory strategies of a group of speakers. Individual principal component analysis (PCA) and multi-linear decomposition methods have been applied to different representations of the tongue contour extracted from magnetic resonance images (MRI). The corpus consisted of seven speakers articulating 63 French vowels and consonants. On the average, over the seven speakers, the Root Mean Square prediction Error (RMSE) was 0.12 cm accounting for a percentage of variance explanation of 92.6% for the individual PCA, using 4 components. Several Multi-linear decomposition methods, to model the tongue contour with a single set of components, have been performed and compared. The 2-Level-PCA gave the best results among the other techniques. By means of a Student's t-test, at 5% of significance level, we found that 2-level-PCA equals the PCA performance with 11 components to represent 91% of the variance explanation with a RMSE of 0.11 cm. While the same method, with 4 components, represents 75% of the variance explanation with a RMSE of 0.19 cm.

Index Terms: Articulatory modelling, speaker normalisation, factor analysis, MRI

Full Paper

Bibliographic reference.  Vargas, Julián Andrés Valdés / Badin, Pierre / Lamalle, Laurent (2012): "Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods", In INTERSPEECH-2012, 2186-2189.