Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts

Matthew Perez, Zakaria Aldeneh, Emily Mower Provost


Robust speech recognition is a key prerequisite for semantic feature extraction in automatic aphasic speech analysis. However, standard one-size-fits-all automatic speech recognition models perform poorly when applied to aphasic speech. One reason for this is the wide range of speech intelligibility due to different levels of severity (i.e., higher severity lends itself to less intelligible speech). To address this, we propose a novel acoustic model based on a mixture of experts (MoE), which handles the varying intelligibility stages present in aphasic speech by explicitly defining severity-based experts. At test time, the contribution of each expert is decided by estimating speech intelligibility with a speech intelligibility detector (SID). We show that our proposed approach significantly reduces phone error rates across all severity stages in aphasic speech compared to a baseline approach that does not incorporate severity information into the modeling process.


 DOI: 10.21437/Interspeech.2020-2049

Cite as: Perez, M., Aldeneh, Z., Provost, E.M. (2020) Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts. Proc. Interspeech 2020, 4986-4990, DOI: 10.21437/Interspeech.2020-2049.


@inproceedings{Perez2020,
  author={Matthew Perez and Zakaria Aldeneh and Emily Mower Provost},
  title={{Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4986--4990},
  doi={10.21437/Interspeech.2020-2049},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2049}
}