In this paper methods for creating multiple baseforms, in an HMM speaker independent speech recognition system are compared and analyzed. The multiple baseforms are used to better model different speakers characteristics, such as sex or regional accent. The required speaker classes are obtained either from known categorical differences (sex, address) or in an adaptive clustering procedure. Both methods are compared in a Dutch/Flemish digit recognizer. Telephone recordings from 600 Dutch and 600 Flemish speakers were used. A 2 baseform system based on regional subdivision leads to a 3% improvement in recognition performance, and yields results comparable to within class performances using a single model. Furthermore division on the basis of accent is significantly more advantageous than a division based on sex. Iterative clustering procedures do in general not work well as the different models tend to overlap more with every iteration step. Ultimately it was found that subdivision of speakers in classes only helps if the number of speakers per class remains truly large (typically > 200).
Bibliographic reference. Compernolle, Dirk Van / Smolders, J. / Jaspers, P. / Hellemans, T. (1991): "Speaker clustering for dialectic robustness in speaker independent recognition", In EUROSPEECH-1991, 723-726.