4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
The purpose of this research is to determine how models of human auditory physiology can improve the performance of automatic speech recognition systems. In this study, a series of experiments was undertaken to discover how humans categorize and confuse vowels in natural speech. The recognition task comprised a large number of vowel nuclei isolated from naturally spoken sentences of a large number of talkers. Machine vowel classifiers were trained to match the results of these vowel categorization experiments using two input feature representations: a spectral-energy feature representation, and a representation derived from an auditory model. Classifiers trained to input representations derived from the auditory model match human performance and are more robust in the presence of noise and spectral filtering than classifiers trained to spectral-energy representations.
Bibliographic reference. Hunke, Martin / Holton, Thomas (1996): "Training machine classifiers to match the performance of human listeners in a natural vowel classification task", In ICSLP-1996, 574-577.