First International Conference on Spoken Language Processing (ICSLP 90)
The performance of a small vocabulary speaker-dependent robust speech recogniser can be improved by adding more input features in the front-end. Our present speech recognition system employs both static & dynamic spectral representations which are combined with a linear discriminant analysis. We have done recognition experiments with CVC words, differing in their initial consonant phonemes only, e.g. peep vs beep and found that most of the errors are due to the system not distinguishing between voiceless/voiced stop consonants. There are a number of acoustic cues useful to improve distinction between voiceless/voiced plosives, specifically, the fundamental frequency at voicing onset and the Voice Onset Time (VOT). This paper reports on recognition experiments where both of these features are extracted from the speech signal and are combined with the other features using the linear discriminant network. The results from the experiments confirmed that the addition of these two input features improved the performance of the recogniser for confusable word-pairs.
Bibliographic reference. Lefebvre, Claude / Zwierzynski, Dariusz A. (1990): "The use of discriminant neural networks in the integration of acoustic cues for voicing into a continuous-word recognition system", In ICSLP-1990, 1073-1076.