5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Suprasegmental Duration Modelling with Elastic Constraints in Automatic Speech Recognition

Laurence Molloy, Stephen Isard

Centre for Speech Technology Research, UK

In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the post-processing imposes syllable-level elastic constraints on the durational behaviour of speech segments. Results are presented for word accuracy on the Resource Management database after rescoring, using two different syllable-like constraint units, a fixed-size N-phone window and simple (no constraint) phone duration probability scoring.

