Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
Recently, researchers have been studying the representation of "stress" and its relation to continuous phone recognition and continuous speech recognition.  has claimed an improvement in error rate due to an explicit marking of lexical stress when performing continuous speech recognition. On the other hand,  reported that using two levels of stress (as opposed to one level) did not reduce the error rate when performing phone recognition of continous speech. The English speaker-dependent continuous speech recognition system developed by Dragon Systems currently uses three levels of lexical stress for each of seventeen vowels. The English phoneme alphabet used by the system also includes twenty-six consonants (including three syllabic consonants, /M/, /N/, and /L/), totaling seventy-seven phonemes. The different stress levels lead to a large number of parameters to estimate when training models and performing recognition tasks. Motivated by the desire to downsize the parameter set, this paper is a preliminary study of how to maintain recognition performance when the number of stress levels is reduced. Performance results are reported in terms of the Wall Street Journal 5000-word vocabulary recognition task.
Bibliographic reference. Bishop, Kathleen (1992): "Modeling sentential stress in the context of a large vocabulary continuous speech recognizer", In ICSLP-1992, 437-440.