4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Analysis of Speech Segments using Variable Spectral/Temporal Resolution

Xihong Wang, Stephen A. Zahorian, Stefan Auberg

Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA

In this paper we present an approach for efficiently computing a compact temporal/spectral feature set for representing a segment of speech, with effective resolution depending on both frequency and time position within the segment. The goal is to mimic the resolution properties of the human auditory system, but using a computationally efficient FFT-based front end rather than a more complex auditory model. In particular we apply both frequency and time "warping" to FFT spectra to obtain good frequency resolution at low frequencies and good time resolution at high frequencies. Time resolution is also varied so that the center of the segment is better represented than the endpoints. The resolution can be varied by the selection of "warping" functions controlled using a small number of parameters. The method was experimentally verified for the classification of six stops extracted from the TIMIT continuous speech data base. The best classification rate obtained was 81.2% for test data using 50 features computed with the method presented.

Full Paper

Bibliographic reference.  Wang, Xihong / Zahorian, Stephen A. / Auberg, Stefan (1996): "Analysis of speech segments using variable spectral/temporal resolution", In ICSLP-1996, 1221-1224.