ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing

ICC Jeju, Korea
October 3, 2004

Auditory-Based Automatic Speech Recognition

Werner Hemmert (1), Marcus Holmberg (1), David Gelbart (2)

(1) Infineon Technologies, Corporate Research, Munich, Germany
(2) International Computer Science Institute, Berkeley, CA, USA

In this paper we develop a physiologically motivated model of peripheral auditory processing and evaluate how the different processing steps influence automatic speech recognition in noise. The model features large dynamic compression (>60 dB) and a realistic sensory cell model. The compression range was well matched to the limited dynamic range of the sensory cells and the model yielded surprisingly high recognition scores. We also developed a computationally efficient simplified model of auditory processing and found that a model of adaptation could improve recognition accuracy. Adaptation is a basic principle of neuronal processing, which accentuates signal onsets. Applying this adaptation model to melfrequency cepstral coefficient (MFCC) feature extraction enhanced recognition accuracy in noise (AURORA 2 task, averaged recognition scores) from 56.4% to 75.6% (clean training condition), a relative improvement of 41% in word error rate. Adaptation outperformed RASTA processing by more than 10%, which corresponds to a relative improvement of 31%.

Full Paper

Bibliographic reference.  Hemmert, Werner / Holmberg, Marcus / Gelbart, David (2004): "Auditory-based automatic speech recognition", In SAPA-2004, paper 74.