SLaTE 2015 - Workshop on Speech and Language Technology in Education

Leipzig, Germany
September 4-5, 2015

Analysis of Phone Errors in Computer Recognition of Children’s Speech

Eva Fringi (1,2), Jill Fain Lehman (2), Martin Russell (1)

(1) School of Electronic Electrical and Systems Engineering, University of Birmingham, UK
(2) Disney Research Pittsburgh, PA, USA

Automatic speech recognition (ASR) for children’s speech is more difficult than for adults’ speech. This paper explores two explanations of this phenomenon, namely (A) that it is due to predictable phonological effects associated with language acquisition in children, or (B) that it is due to the general increase in acoustic variability that has been observed in children’s speech. Phone recognition experiments are conducted on hand labelled data for children aged between 5 and 6. A statistical comparison of the resulting confusion matrix with that for adult speech (TIMIT) shows significant increases in phone substitution rates for children, some of which correspond to established phonological phenomena (type A errors). However these only account for a small proportion of errors, and those associated with general acoustic variability (type B) appear to account for the majority. The study also shows significantly more deletion errors in ASR for children’s speech. Overall, the results suggest that attempts to improve ASR accuracy for children’s speech by accommodating phonological phenomena associated with language acquisition, for example by changing the pronunciation dictionary, are unlikely to deliver significant success in the short term, and that coping with the increased acoustic variability in children’s speech should be the immediate priority.

Full Paper

Bibliographic reference.  Fringi, Eva / Lehman, Jill Fain / Russell, Martin (2015): "Analysis of phone errors in computer recognition of children’s speech", In SLaTE-2015, 101-105.