Mixtures of Deep Neural Experts for Automated Speech Scoring

Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna

The paper copes with the task of automatic assessment of second language proficiency from the language learners’ spoken responses to test prompts. The task has significant relevance to the field of computer assisted language learning. The approach presented in the paper relies on two separate modules: (1) an automatic speech recognition system that yields text transcripts of the spoken interactions involved, and (2) a multiple classifier system based on deep learners that ranks the transcripts into proficiency classes. Different deep neural network architectures (both feed-forward and recurrent) are specialized over diverse representations of the texts in terms of: a reference grammar, the outcome of probabilistic language models, several word embeddings, and two bag-of-word models. Combination of the individual classifiers is realized either via a probabilistic pseudo-joint model, or via a neural mixture of experts. Using the data of the third Spoken CALL Shared Task challenge, the highest values to date were obtained in terms of three popular evaluation metrics.

 DOI: 10.21437/Interspeech.2020-1055

Cite as: Papi, S., Trentin, E., Gretter, R., Matassoni, M., Falavigna, D. (2020) Mixtures of Deep Neural Experts for Automated Speech Scoring. Proc. Interspeech 2020, 3845-3849, DOI: 10.21437/Interspeech.2020-1055.

  author={Sara Papi and Edmondo Trentin and Roberto Gretter and Marco Matassoni and Daniele Falavigna},
  title={{Mixtures of Deep Neural Experts for Automated Speech Scoring}},
  booktitle={Proc. Interspeech 2020},