4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
We are exploring ways in which to rapidly adapt our neural network classifiers to new speakers and conditions using very small amounts of speech, say, one or a few words. Our approach is to perform a speaker-dependent warping of the frequency scale by selecting a Bark offset for each speaker. We choose the offset for a speaker to be the one that maximizes our recognizer output score on the adaptation utterance. We then use the speaker's offset during evaluation of all other utterances by the speaker. To test our approach, we evaluate an adult-speech trained recognizer on children's speech from the same task both before and after adaptation to each child's voice. Using only a single digit for adaptation, we have reduced the word error rate for children's speech from 9.6% to 4.2%. Using a seven-digit utterance further reduced the error rate to 3.5%.
Bibliographic reference. Burnett, Daniel C. / Fanty, Mark (1996): "Rapid unsupervised adaptation to children's speech on a connected-digit task", In ICSLP-1996, 1145-1148.