Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


Modelling Articulatory Inter-Timing Variation in a Speech Recognition System Based on Synthetic References

M. Blomberg

Department of Speech Communication and Music Acoustics, KTH, Stockholm, Sweden

Variation in the synchrony between two or more simultaneous articulatory gestures in speech may cause large variability in the acoustic signal and lower the accuracy and robustness of recognition systems. In this report, a technique is described that accounts for this effect by predicting alternative ways of pronunciation of an utterance. A formant based speech production system is used for generating the reference templates to be used for recognition. The delay between voicing transition and formant movements has been systematically varied, by the production system, forming different paths through a transition network at phoneme boundaries. In a pilot experiment, the recogniser behaviour was examined for utterances having different time position of the devoicing of phrase-final vowels.

Full Paper

Bibliographic reference.  Blomberg, M. (1991): "Modelling articulatory inter-timing variation in a speech recognition system based on synthetic references", In EUROSPEECH-1991, 789-792.