Conventional speaker independent speech recognition systems are trained using data from many different speakers. Inter-speaker variability is a major problem because parametric representations of speech are highly speaker dependent. This paper describes a technique which allows speaker dependent parameters to be considered when building a speaker independent speech recognition system. The technique is based on utterance clustering, where subsets of the training data are formed and the variability within each subset minimized. Cluster dependent connectionist models are then used to estimate phone probabilities as part of a hybrid connectionist hidden Markov model based large vocabulary talker independent speech recognition system. The system has been evaluated on the ARPA Wall Street Journal continuous speech recognition task.
Bibliographic reference. Cook, G. D. / Robinson, A. J. (1995): "Utterance clustering for large vocabulary continuous speech recognition", In EUROSPEECH-1995, 219-222.