Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

An RNN Based Speech Recognition System with Discriminative Training

Tan Lee (1), P. C. Ching (1), L. W. Chan (2)

(1) Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
(2) Department of Computer Science, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

In our previous work [1], a novel method of utilizing a set of fully connected recurrent neural networks (RNNs) for speech modeling has been proposed. Despite the effectiveness of the RNN model in characterizing individual speech units, the system performs less satisfactorily for speech recognition due to poor discrimination between models. In this paper, an efficient discriminative training procedure is developed for the RNN based recognition system. By using discriminative training, each RNN speech model is adjusted to reduce its distance from the designated speech unit while increase distances from the others. In addition, a duration-screening process is introduced to enhance the discriminating power of the recognition system. Speaker-dependent recognition experiments have been carried out for 1) 11 isolated Cantonese digits, 2) 58 very confusing Cantonese CV syllables, and 3) 20 English isolated words. The recognition rates attained are 90.9%, 86.7% and 93.5% respectively.

Full Paper

Bibliographic reference.  Lee, Tan / Ching, P. C. / Chan, L. W. (1995): "An RNN based speech recognition system with discriminative training", In EUROSPEECH-1995, 1667-1670.