EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Why is Automatic Recognition of Children's Speech Difficult?

Qun Li, Martin J. Russell

University of Birmingham, UK

This paper is concerned with automatic recognition of children’s speech. The paper begins with a comparison of vowel formant frequencies for adult and children’s speech, and notes that in many cases, the average value of F3 for children is greater than 4kHz. Next it is shown that recognition accuracy for children’s speech degrades rapidly as bandwidth is reduced to less that 6kHz. Finally, it is demonstrated that the choice of front-end signal processing parameters such as analysis window length, and mel-scale filter widths, have little effect on recognition accuracy for children’s speech. It is concluded that bandwidth reduction is a major contributor to the difficulty of recognition of children’s speech.

Full Paper

Bibliographic reference.  Li, Qun / Russell, Martin J. (2001): "Why is automatic recognition of children's speech difficult?", In EUROSPEECH-2001, 2671-2674.