Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition

Zhengjun Yue, Heidi Christensen, Jon Barker


Automatic recognition of dysarthric speech is a very challenging research problem where performances still lag far behind those achieved for typical speech. The main reason is the lack of suitable training data to accommodate for the large mismatch seen between dysarthric and typical speech. Only recently has focus moved from single-word tasks to exploring continuous speech ASR needed for dictation and most voice-enabled interfaces. This paper investigates improvements to dysarthric continuous ASR. In particular, we demonstrate the effectiveness of using unsupervised autoencoder-based bottleneck (AE-BN) feature extractor trained on out-of-domain (OOD) LibriSpeech data. We further explore multi-task optimisation techniques shown to benefit typical speech ASR. We propose a 5-fold cross-training setup on the widely used TORGO dysarthric database. A setup we believe is more suitable for this low-resource data domain. Results show that adding the proposed AE-BN features achieves an average absolute (word error rate) WER improvement of 2.63% compared to the baseline system. A further reduction of 2.33% and 0.65% absolute WER is seen when applying monophone regularisation and joint optimisation techniques, respectively. In general, the ASR system employing monophone regularisation trained on AE-BN features exhibits the best performance.


 DOI: 10.21437/Interspeech.2020-2746

Cite as: Yue, Z., Christensen, H., Barker, J. (2020) Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition. Proc. Interspeech 2020, 4581-4585, DOI: 10.21437/Interspeech.2020-2746.


@inproceedings{Yue2020,
  author={Zhengjun Yue and Heidi Christensen and Jon Barker},
  title={{Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4581--4585},
  doi={10.21437/Interspeech.2020-2746},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2746}
}