5^{th} International Conference on Spoken Language ProcessingSydney, Australia |
When performing speaker adaptation there are two conflicting requirements. The transform must be powerful enough to model the speaker. Second, the transform should be rapidly estimated for any particular speaker. Recently the most popular adaptation schemes have used many parameters to adapt the models. This paper examines an adaptation scheme requiring few parameters to adapt the models, cluster adaptive training. It may be viewed as a simple extension to speaker clustering. A linear interpolation of the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data compared to a standard system. a speaker independent model set.
Bibliographic reference. Gales, Mark J. F. (1998): "Cluster adaptive training for speech recognition", In ICSLP-1998, paper 0375.