Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Utterance Clustering for Large Vocabulary Continuous Speech Recognition

G. D. Cook, A. J. Robinson

Cambridge University Engineering Department, Cambridge, England

Conventional speaker independent speech recognition systems are trained using data from many different speakers. Inter-speaker variability is a major problem because parametric representations of speech are highly speaker dependent. This paper describes a technique which allows speaker dependent parameters to be considered when building a speaker independent speech recognition system. The technique is based on utterance clustering, where subsets of the training data are formed and the variability within each subset minimized. Cluster dependent connectionist models are then used to estimate phone probabilities as part of a hybrid connectionist hidden Markov model based large vocabulary talker independent speech recognition system. The system has been evaluated on the ARPA Wall Street Journal continuous speech recognition task.

Full Paper

Bibliographic reference.  Cook, G. D. / Robinson, A. J. (1995): "Utterance clustering for large vocabulary continuous speech recognition", In EUROSPEECH-1995, 219-222.