12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Toward a Multi-Speaker Visual Articulatory Feedback System

Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly

GIPSA, France

In this paper, we present recent developments on the HMM-based acoustic-to-articulatory inversion approach that we develop for a "visual articulatory feedback" system. In this approach, multistream phoneme HMMs are trained jointly on synchronous streams of acoustic and articulatory data, acquired by electromagnetic articulography (EMA). Acoustic-to-articulatory inversion is achieved in two steps. Phonetic and state decoding is first performed. Then articulatory trajectories are inferred from the decoded phone and state sequence using the maximum-likelihood parameter generation algorithm (MLPG). We introduce here a new procedure for the re-estimation of the HMM parameters, based on the Minimum Generation Error criterion (MGE). We also investigate the use of model adaptation techniques based on maximum likelihood linear regression (MLLR), as a first step toward a multi-speaker visual articulatory feedback system.

Full Paper

Bibliographic reference.  Youssef, Atef Ben / Hueber, Thomas / Badin, Pierre / Bailly, Gérard (2011): "Toward a multi-speaker visual articulatory feedback system", In INTERSPEECH-2011, 589-592.