Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Transition-Based Feature Extraction Within Frame-Based Recognition

Zhihong Hu, Etienne Barnard, Ronald A. Cole

Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology

Current frame-based speech recognition systems sample speech at a fixed set of locations relative to each frame. Modeling the temporal dynamic behavior of speech is thereby complicated. This work shows that by explicitly using transitional information when extracting features, one can better model the acoustic phonetic structure, resulting in higher word level recognition performance. In this proposed approach, features representing local transitional information are used (a constant number of features are selected at each time frame, but the features are sampled near areas of greatest spectrum change within a relatively long window.) By explicitly modeling transitions in this way, we can also model local contextual information. Using this technique, the word level error rate decreased up to 30% on the databases we tested.

Full Paper

Bibliographic reference.  Hu, Zhihong / Barnard, Etienne / Cole, Ronald A. (1995): "Transition-based feature extraction within frame-based recognition", In EUROSPEECH-1995, 1555-1558.