4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper presents a set of effective and efficient techniques to improve the discrimination capability of a recurrent neural network (RNN) based isolated word recognizer. The recognizer contains a set of individually trained RNN speech models (RSMs). Each of them represents a different word in the vocabulary. Speech recognition is performed by selecting the RSM that best matches the input utterance. For temporal supervised training of the RSMs, a new error function is introduced, in which the contributions of all phonetic components are equalized regardless of their difference in duration. The learning rate for recurrent connections is amplified. This is aimed at strengthening temporal dependency in the RSMs to capture dynamic characteristics of speech signals. Furthermore, a hierarchical training strategy is employed to facilitate more efficient discriminative training among the RSMs. A series of speaker-dependent recognition experiments are performed to evaluate the effectiveness of the proposed techniques.
Bibliographic reference. Lee, Tan / Ching, P. C. (1996): "On improving discrimination capability of an RNN based recognizer", In ICSLP-1996, 526-529.