First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Modeling of 3-Dimensional Vocal Tract Shapes Obtained by Magnetic Resonance Imaging for Speech Synthesis

Masafumi Matsumura (1), Atsushi Sugiura (2)

(1) Dept. of Applied Electronics, Fac. of Engineering, Osaka Electro-Communication University, Osaka, Japan
(2) Dept. of Electrical Engineering, Fac. of Engineering, Osaka University, Osaka, Japan

Three-dimensional vocal tract shapes have been studied using the magnetic resonance (MR) imaging. MR Images of 24 horizontal sections from the larynx to the nasal cavity at 0.6-cm intervals, were measured during steady-state productions of Japanese vowels. The measurement time was 123 second. Configurations of the area interior to the horizontal vocal tract were extracted manually from the horizontal MR images by using digitizer, and three-dimensional vocal tract shapes were obtained from the 24 areas of horizontal vocal tract. Curvature functions of the mid-sagittal tongue shape and cross-sectional areas interior to the vocal tract, were estimated from the three-dimensional vocal tract data. Based on the observation, a three-dimensional vocal tract model that estimate the vocal tract area function from 2 positions on frontal tongue surface, was proposed for natural speech synthesis. The vocal tract model consist of articulatory models for the midsagittal tongue shape and for cross-sectional area interior to the vocal tract. Vowels were synthesized by model parameters adaptation of the 2 positions on frontal tongue shape. Time courses of the formant frequency for the synthesized vowels, agreed with one for the subjects' original productions. The results indicate the usefulness of the proposed models based on the observation of the three-dimensional vocal tract shape.

