First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

A Comparative Study of Acoustic Representations of Speech for Vowel Classification Using Multi-Layer Perceptrons

Helen M. Meng, Victor W. Zue

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, USA

This paper describes a study comparing several signal representations for context-independent vowel classification. It forms the first step in our investigation for a distinctive-feature-based approach to phonetic recognition. Six different signal representations were investigated. They include the outputs of Seneff's Auditory Model (SAM), the mel-scale representations and the conventional Fourier Transform. To strive towards a fair and meaningful comparison, the mel-frequency niters were carefully designed to resemble the filters of SAM and the dimensionality of the feature vectors were constrained to be equal. The representations were compared on the basis of classifying 16 vowels in American English. Experiments with speech degraded by adding white noise were also conducted. Our results are based on over 22,000 vowel tokens excised from 2,750 sentences spoken by 550 speakers. The combined Synchronous and Mean Rate responses from SAM outperformed all the other representations with both undegraded and noisy speech, yielding top-choice accuracies of 66% and 54% respectively.

Full Paper

Bibliographic reference.  Meng, Helen M. / Zue, Victor W. (1990): "A comparative study of acoustic representations of speech for vowel classification using multi-layer perceptrons", In ICSLP-1990, 1053-1056.