Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations

Zixing Zhang, Alejandrina Cristia, Anne Warlaumont, Björn Schuller

A key outstanding task for speech technology involves dealing with non-standard speakers, notably young children. Distinguishing children's linguistic from non-linguistic vocalisations is crucial for a number of applied and fundamental research goals and yet there are few systems available for such a classification. This paper investigates two large-scale frame-level acoustic feature sets (eGeMAPS and ComParE16) followed by a dynamic model (GRU-RNN) and two kinds of derived static feature sets on the segment level (functional-based and Bag of Audio Words) combined with a static model (SVM) and automatically learnt representations directly from original raw voice signals by using an end-to-end system, which are compared against a simple phonetically-inspired baseline. These are applied to a large database of children's vocalisations (total N = 6,298) drawn from daylong recordings gathered in Namibia, Bolivia and Vanuatu. All of the systems outperform the baseline, with the highest performance in the test set for GRU-RNN using ComParE16 features. We identify promising paths of further research, including the application of a finer-grained classification of children's vocalisations onto these data and the exploration of other feature systems.

 DOI: 10.21437/Interspeech.2018-2523

Cite as: Zhang, Z., Cristia, A., Warlaumont, A., Schuller, B. (2018) Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations. Proc. Interspeech 2018, 2588-2592, DOI: 10.21437/Interspeech.2018-2523.

  author={Zixing Zhang and Alejandrina Cristia and Anne Warlaumont and Björn Schuller},
  title={Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations},
  booktitle={Proc. Interspeech 2018},