Multi-Scale Convolution for Robust Keyword Spotting

Chen Yang, Xue Wen, Liming Song


We propose a robust small-footprint keyword spotting system for resource-constrained devices. Small footprint is achieved by the use of depthwise-separable convolutions in a ResNet framework. Noise robustness is achieved with a multi-scale ensemble of classifiers: each classifier is specialized for a different view of the input, while the whole ensemble remains compact in size by heavy parameter sharing. Extensive experiments on public Google Command dataset demonstrate the effectiveness of our proposed method.


 DOI: 10.21437/Interspeech.2020-2185

Cite as: Yang, C., Wen, X., Song, L. (2020) Multi-Scale Convolution for Robust Keyword Spotting. Proc. Interspeech 2020, 2577-2581, DOI: 10.21437/Interspeech.2020-2185.


@inproceedings{Yang2020,
  author={Chen Yang and Xue Wen and Liming Song},
  title={{Multi-Scale Convolution for Robust Keyword Spotting}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2577--2581},
  doi={10.21437/Interspeech.2020-2185},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2185}
}