Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu


Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is in turn used to obtain improved feature representation for dysarthric speech. Additionally, we analyze the use of severity based tempo adaptation followed by autoencoder based speech feature enhancement. All evaluations were carried out on Universal Access dysarthric speech corpus. An overall absolute improvement of 16% was achieved using tempo adaptation followed by autoencoder based speech front end representation for DNN-HMM based dysarthric speech recognition.


 DOI: 10.21437/Interspeech.2017-1318

Cite as: Vachhani, B., Bhat, C., Das, B., Kopparapu, S.K. (2017) Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition. Proc. Interspeech 2017, 1854-1858, DOI: 10.21437/Interspeech.2017-1318.


@inproceedings{Vachhani2017,
  author={Bhavik Vachhani and Chitralekha Bhat and Biswajit Das and Sunil Kumar Kopparapu},
  title={Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1854--1858},
  doi={10.21437/Interspeech.2017-1318},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1318}
}