Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks

Ming Tu, Visar Berisha, Julie Liss


Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a high-dimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).


 DOI: 10.21437/Interspeech.2017-1222

Cite as: Tu, M., Berisha, V., Liss, J. (2017) Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks. Proc. Interspeech 2017, 1849-1853, DOI: 10.21437/Interspeech.2017-1222.


@inproceedings{Tu2017,
  author={Ming Tu and Visar Berisha and Julie Liss},
  title={Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1849--1853},
  doi={10.21437/Interspeech.2017-1222},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1222}
}