A survey about databases of children's speech

Felix Claus, Hamurabi Gamboa Rosales, Rico Petrick, Horst-Udo Hain, Rüdiger Hoffmann

In this paper we survey databases of children's speech. A current trend in research is the investigation of children's automatic speech recognition (ASR). Therefore, databases of childrenfs speech are needed for testing but also for training of ASR systems. However, unlike adult speech corpora, databases for children are rarely available, and in current literature there is no overview of existing databases to be found. Most children's speech databases contain recorded speech in English of children aged between 6 and 18 years. They are described in the first part of this paper. Subsequently databases for German and other languages are mentioned. They are even more rarely available than English databases. In particular, recordings of preschool children are very rare and therefore regarded separately. Due to the fact that preschool children are not able to read, traditional recording methods cannot be applied, which makes recording of their speech complex. Some ideas covering the difficulties of recordings for speech databases of preschool children are mentioned. Utilizing these methods a small database of German children's speech has been created. Furthermore some statistics about children's speech data are presented.

