12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Generating Animated Pronunciation from Speech Through Articulatory Feature Extraction

Yurie Iribe (1), Silasak Manosavanh (1), Kouichi Katsurada (1), Ryoko Hayashi (2), Chunyue Zhu (2), Tsuneo Nitta (1)

(1) Toyohashi University of Technology, Japan
(2) Kobe University, Japan

We automatically generate CG animations to express the pronunciation movement of speech through articulatory feature (AF) extraction to help learn a pronunciation. The proposed system uses MRI data to map AFs to coordinate values that are needed to generate the animations. By using magnetic resonance imaging (MRI) data, we can observe the movements of the tongue, palate, and pharynx in detail while a person utters words. AFs and coordinate values are extracted by multi-layer neural networks (MLN). Specifically, the system displays animations of the pronunciation movements of both the learner and teacher from their speech in order to show in what way the learner's pronunciation is wrong. Learners can thus understand their wrong pronunciation and the correct pronunciation method through specific animated pronunciations. Experiments to compare MRI data with the generated animations confirmed the accuracy of articulatory features. Additionally, we verified the effectiveness of using AF to generate animation.

Full Paper

Bibliographic reference.  Iribe, Yurie / Manosavanh, Silasak / Katsurada, Kouichi / Hayashi, Ryoko / Zhu, Chunyue / Nitta, Tsuneo (2011): "Generating animated pronunciation from speech through articulatory feature extraction", In INTERSPEECH-2011, 1617-1620.