5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Novel Filler Acoustic Models for Connected Digit Recognition

Ilija Zeljkovic, Shrikanth Narayanan

AT&T Labs, Florham Park, NJ, USA

The context-dependent modeling technique is extended to include non-speech filler segments occurring between speech word units. In addition to the conventional context-dependent word or subword units, the proposed acoustic modeling provides an eficient way of accounting for the effects of the surrounding speech on the inter-word non-speech segments, especially for small vocabulary recognition tasks. It is argued that a robust recognition scheme is obtained by explicitly accounting for context-dependent inter-word filler acoustics in training while ignoring their explicit context dependencies during recognition testing. Results on a connected digit recognition task over the telephone network indicate an improvement in the error rate from 2.5% to 0.9% i.e., about 64% word error-rate reduction, using the improved model set.

Full Paper

Bibliographic reference.  Zeljkovic, Ilija / Narayanan, Shrikanth (1997): "Novel filler acoustic models for connected digit recognition", In EUROSPEECH-1997, 283-286.