Knowledge Distillation for Throat Microphone Speech Recognition

Takahito Suzuki, Jun Ogata, Takashi Tsunakawa, Masafumi Nishida, Masafumi Nishimura

Throat microphones are robust against external noise because they receive vibrations directly from the skin, however, their available speech data is limited. This work aims to improve the speech recognition accuracy of throat microphones, and we propose a knowledge distillation method of hybrid DNN-HMM acoustic model. This method distills the knowledge from acoustic model trained with a large amount of close-talk microphone speech data (teacher model) to acoustic model for throat microphones (student model) using a small amount of parallel data of throat and close-talk microphones. The frontend network of the student model contains a feature mapping network from throat microphone acoustic features to close-talk microphone bottleneck features, and the back-end network is a phonetic discrimination network from close-talk microphone bottleneck features. We attempted to improve recognition accuracy further by initializing student model parameters using pretrained front-end and back-end networks. Experimental results using Japanese read speech data showed that the proposed approach achieved 9.8% relative improvement of character error rate (14.3% → 12.9%) compared to the hybrid acoustic model trained only with throat microphone speech data. Furthermore, under noise environments of approximately 70 dBA or higher, the throat microphone system with our approach outperformed the close-talk microphone system.

 DOI: 10.21437/Interspeech.2019-1597

Cite as: Suzuki, T., Ogata, J., Tsunakawa, T., Nishida, M., Nishimura, M. (2019) Knowledge Distillation for Throat Microphone Speech Recognition. Proc. Interspeech 2019, 461-465, DOI: 10.21437/Interspeech.2019-1597.

  author={Takahito Suzuki and Jun Ogata and Takashi Tsunakawa and Masafumi Nishida and Masafumi Nishimura},
  title={{Knowledge Distillation for Throat Microphone Speech Recognition}},
  booktitle={Proc. Interspeech 2019},