5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Hierarchical Duration Modelling for Speech Recognition Using The ANGIE Framework

Grace Chung, Stephanie Seneff

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts, USA

We describe a novel hierarchical duration model for speech recognition. The modelling scheme is based on the angie framework, a exible unified sublexical representation for speech applications. Our duration model captures contextual factors that in uence duration of sublexical units at multiple linguistic levels simultaneously, using both relative and absolute duration information. The modelling procedure involves a normalization scheme which produces a new measure for relative speaking rate at a word level. This may be used to explore phenomena in speech timing and we present studies on secondary effects of speaking rate here. This duration model demonstrates its ability to aid speech recognition in phonetic recognition experiments where it has yielded a relative improvement of up to 7.7%. In word spotting, a study employing duration as a post-processor in disambiguating between 2 acoustically similar keywords reduces relative error by 68%. Furthermore, a fully integrated duration model in an angie based word spotter improves performance by 21.5%. All gains are over and above any gains realized from standard phone duration models present in the baseline system. All experiments were conducted in the atis domain, using continuous spontaneous speech.

Full Paper

Bibliographic reference.  Chung, Grace / Seneff, Stephanie (1997): "Hierarchical duration modelling for speech recognition using the ANGIE framework", In EUROSPEECH-1997, 1475-1478.