4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Pseudo-articulatory Speech Synthesis for Recognition using Automatic Feature Extraction from X-Ray Data

C. S. Blackburn, S. J. Young

Cambridge University Engineering Department (CUED), UK

We describe a self-organising pseudo-articulatory speech production model (SPM) trained on an X-ray microbeam database, and present results when using the SPM within a speech recognition framework. Given a time-aligned phonemic string, the system uses an explicit statistical model of co-articulation to generate pseudo-articulator trajectories. From these, parametrised speech vectors are synthesised using a set of artificial neural networks (ANNs). We present an analysis of the articulatory information in the database used, and demonstrate the improvements in articulatory modelling accuracy obtained using our co-articulation system. Finally, we give results when using the SPM to re-score N-best utterance transcription lists as produced by the CUED HTK Hidden Markov Model (HMM) speech recognition system. Relative reductions of 18% in the phoneme error rate and 15% in the word error rate are achieved.

Full Paper

Bibliographic reference.  Blackburn, C. S. / Young, S. J. (1996): "Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from x-ray data", In ICSLP-1996, 969-972.