4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Articulatory Synthesis from X-rays and Inversion for an Adaptive Speech Robot

Pierre Badin, Christian Abry

Institut de la Communication Parlée, UPRESA CNRS, Grenoble, France

This paper describes a speech robotic approach to articulatory synthesis. An anthropomorphic speech robot has been built, based on a real reference subject’s data. This speech robot, called the Articulotron, has a set of relevant degrees of freedom for speech articulators, jaw, tongue, lips, and larynx. The associated articulatory model has been elaborated from cineradiographic midsagittal profiles recorded in synchrony with front lips views; the model of noise source for fricative excitation has been derived from acoustic and aerodynamic measurements on the same reference subject. In a first phase, the Articulotron has been used to perform the copy synthesis of the vowels, fricative and plosive consonants in the X-ray corpus. This allows to assess the performance of the Articulotron in producing fairly high quality speech, and provides a reference against which other attempts of articulatory synthesis can be compared. In a second phase, the Articulotron has be used to recover articulatory gestures from audio-visual speech prototypes. At the present stage, a gradient descent algorithm is used to learn the articulatory trajectories of the robot by optimisation, starting from the formant trajectories and the knowledge of constraints for the consonantal constriction or closure, in order to mimic the original VCV audio-visual sequences. The adaptive skill of the robot is demonstrated through articulator perturbation experiments and through the elaboration of relevant strategies in the hyper/hypo speech paradigm. A video tape will demonstrate an animation of the Articulotron, displaying the jaw, the tongue and the lips, for various examples of adaptive articulatory synthesis.

