5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Speaker Normalization Through Formant-Based Warping of the Frequency Scale

Evandro B. Gouvea, Richard M. Stern

Department of Electrical and Computer Engineering School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough training data are available to model acoustical variability among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms in an attempt to reduce variability between speakers. Recent successful speaker normalization algorithms have incorporated a speaker-specific frequency warping to the initial signal processing stages. These algorithms, however, do not make extensive use of acoustic features contained in the incoming speech. In this paper we study the possible benefits of the use of acoustic features in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the use of formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare their performance relative to existing algorithms.

Full Paper

Bibliographic reference.  Gouvea, Evandro B. / Stern, Richard M. (1997): "Speaker normalization through formant-based warping of the frequency scale", In EUROSPEECH-1997, 1139-1142.