Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech

Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell, John H.L. Hansen

Speech sound disorders (SSDs) are the most prevalent type of communication disorder among preschoolers. The earlier an SSD is identified, the earlier an intervention can be provided to potentially reduce the social/academic impact of the disorder. The challenge, lies in early identification of such disorders. In this study 29 carefully selected words were produced by 165 children from 3-6 years of age. The audio recordings, were collected by parents using a mobile application /platform. "Ground truth" child status as 'typically developing' vs 'at risk' was based on a percentage of consonants correct-revised growth curve model. State-of-the-art speech processing/speaker recognition models were employed along with our clinical group verification framework. Results showed that text-dependent i-Vector models were superior to both text dependent and text-independent Gaussian Mixture Models (GMMs) for correct classification of children. Fusing individual word, i-Vector models provides insight into word and consonant groupings that are more indicative of 'at risk' child speech.

 DOI: 10.21437/Interspeech.2018-1465

Cite as: Kothalkar, P., Rudolph, J., Dollaghan, C., McGlothlin, J., Campbell, T., Hansen, J.H. (2018) Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech. Proc. Interspeech 2018, 1681-1685, DOI: 10.21437/Interspeech.2018-1465.

  author={Prasanna Kothalkar and Johanna Rudolph and Christine Dollaghan and Jennifer McGlothlin and Thomas Campbell and John H.L. Hansen},
  title={Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech},
  booktitle={Proc. Interspeech 2018},