Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

Yuqin Lin, Longbiao Wang, Sheng Li, Jianwu Dang, Chenchen Ding


This study proposes a staged knowledge distillation method to build End-to-End (E2E) automatic speech recognition (ASR) and automatic speech attribute transcription (ASAT) systems for patients with dysarthria caused by either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS). Compared with traditional methods, our proposed method can use limited dysarthric speech more effectively. And the dysarthric E2E-ASR and ASAT systems enhanced by the proposed method can achieve 38.28% relative phone error rate (PER%) reduction and 48.33% relative attribute detection error rate (DER%) reduction over their baselines respectively on the TORGO dataset. The experiments show that our system offers potential as a rehabilitation tool and medical diagnostic aid.


 DOI: 10.21437/Interspeech.2020-1755

Cite as: Lin, Y., Wang, L., Li, S., Dang, J., Ding, C. (2020) Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription. Proc. Interspeech 2020, 4791-4795, DOI: 10.21437/Interspeech.2020-1755.


@inproceedings{Lin2020,
  author={Yuqin Lin and Longbiao Wang and Sheng Li and Jianwu Dang and Chenchen Ding},
  title={{Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4791--4795},
  doi={10.21437/Interspeech.2020-1755},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1755}
}