4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Conventional features used in state-of-the-art hidden Markov model (HMM) based speech recognition systems are commonly inspired by scientific knowledge and expertise of the human vocal and auditory system. Although the intent when performing feature analysis is to extract \relevant" and \discriminative" information from the signal that is useful for speech recognition, this information may not be consistent with the objective of minimizing error rate in the recognition process. In this paper, we utilize feed-forward artificial neural networks (ANNs) to generate a new class of features for speech recognition. We propose a system for integrating the feature extraction process with the recognition process under a unified statistical framework with a consistent objective function that is designed to minimize recognition error rate. Results on a telephone-based speaker-independent connected digit task indicate that this integrated system with 12 ANNs is able to reduce the per digit error rate by a further 28% over a similar system using a single ANN and 16% over our previously best results in which feature transformation was not incorporated.
Bibliographic reference. Rahim, Mazin G. / Lee, Chin-Hui (1996): "Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training", In ICSLP-1996, 1824-1827.