4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Rapid Unsupervised Adaptation to Children's Speech on a Connected-Digit Task

Daniel C. Burnett, Mark Fanty

Center for Spoken Language Understanding, Oregon Graduate Institute of Science & Technology, OR, USA

We are exploring ways in which to rapidly adapt our neural network classifiers to new speakers and conditions using very small amounts of speech, say, one or a few words. Our approach is to perform a speaker-dependent warping of the frequency scale by selecting a Bark offset for each speaker. We choose the offset for a speaker to be the one that maximizes our recognizer output score on the adaptation utterance. We then use the speaker's offset during evaluation of all other utterances by the speaker. To test our approach, we evaluate an adult-speech trained recognizer on children's speech from the same task both before and after adaptation to each child's voice. Using only a single digit for adaptation, we have reduced the word error rate for children's speech from 9.6% to 4.2%. Using a seven-digit utterance further reduced the error rate to 3.5%.

Full Paper

Bibliographic reference.  Burnett, Daniel C. / Fanty, Mark (1996): "Rapid unsupervised adaptation to children's speech on a connected-digit task", In ICSLP-1996, 1145-1148.