On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children

Gary Yeung, Abeer Alwan

Automatic speech recognition (ASR) systems for children have lagged behind in performance when compared to adult ASR. The exact problems and evaluation methods for child ASR have not yet been fully investigated. Recent work from the robotics community suggests that ASR for kindergarten speech is especially difficult, even though this age group may benefit most from voice-based educational and diagnostic tools. Our study focused on ASR performance for specific grade levels (K-10) using a word identification task. Grade-specific ASR systems were evaluated, with particular attention placed on the evaluation of kindergarten-aged children (5-6 years old). Experiments included investigation of grade-specific interactions with triphone models using feature space maximum likelihood linear regression (fMLLR), vocal tract length normalization (VTLN) and subglottal resonance (SGR) normalization. Our results indicate that kindergarten ASR performs dramatically worse than even 1st grade ASR, likely due to large speech variability at that age. As such, ASR systems may require targeted evaluations on kindergarten speech rather than being evaluated under the guise of "child ASR." Additionally, results show that systems trained in matched conditions on kindergarten speech may be less suitable than mismatched-grade training with 1st grade speech. Finally, we analyzed the phonetic errors made by the kindergarten ASR.

 DOI: 10.21437/Interspeech.2018-2297

Cite as: Yeung, G., Alwan, A. (2018) On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children. Proc. Interspeech 2018, 1661-1665, DOI: 10.21437/Interspeech.2018-2297.

  author={Gary Yeung and Abeer Alwan},
  title={On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children},
  booktitle={Proc. Interspeech 2018},