Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition

Masakiyo Fujimoto


In this paper, we present a framework of a factored deep convolutional neural network (CNN) learning for noise robust automatic speech recognition (ASR). Deep CNN architecture, which has attracted great attention in various research areas, has also been successfully applied to ASR. However, to ensure noise robustness, since merely introducing deep CNN architecture into the acoustic modeling of ASR is insufficient, we introduce factored network architecture into deep CNN-based acoustic modeling. The proposed factored deep CNN framework factors out feature enhancement, delta parameter learning, and hidden Markov model state classification into three specific network blocks. By assigning specific roles to each block, the noise robustness of deep CNN-based acoustic models can be improved. With various comparative evaluations, we reveal that the proposed method successfully improves ASR accuracies in noise environments.


 DOI: 10.21437/Interspeech.2017-225

Cite as: Fujimoto, M. (2017) Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition. Proc. Interspeech 2017, 3837-3841, DOI: 10.21437/Interspeech.2017-225.


@inproceedings{Fujimoto2017,
  author={Masakiyo Fujimoto},
  title={Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3837--3841},
  doi={10.21437/Interspeech.2017-225},
  url={http://dx.doi.org/10.21437/Interspeech.2017-225}
}