Dysarthric Speech Recognition Based on Deep Metric Learning

Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki


We present in this paper an automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. Because their utterances are often unstable or unclear, speech recognition systems have difficulty recognizing the speech of those with this disorder. For example, their speech styles often fluctuate greatly even when they are repeating the same sentences. For this reason, their speech tends to have great variation even within recognition classes. To alleviate this intra-class variation problem, we propose an ASR system based on deep metric learning. This system learns an embedded representation that is characterized by a small distance between input utterances of the same class, while the distance of the input utterances of different classes is large. Therefore, our method makes it easy for the ASR system to distinguish dysarthric speech. Experimental results show that our proposed approach using deep metric learning improves the word-recognition accuracy consistently. Moreover, we also evaluate the combination of our proposed method and transfer learning from unimpaired speech to alleviate the low-resource problem associated with impaired speech.


 DOI: 10.21437/Interspeech.2020-2267

Cite as: Takashima, Y., Takashima, R., Takiguchi, T., Ariki, Y. (2020) Dysarthric Speech Recognition Based on Deep Metric Learning. Proc. Interspeech 2020, 4796-4800, DOI: 10.21437/Interspeech.2020-2267.


@inproceedings{Takashima2020,
  author={Yuki Takashima and Ryoichi Takashima and Tetsuya Takiguchi and Yasuo Ariki},
  title={{Dysarthric Speech Recognition Based on Deep Metric Learning}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4796--4800},
  doi={10.21437/Interspeech.2020-2267},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2267}
}