Deep Template Matching for Small-Footprint and Configurable Keyword Spotting

Peng Zhang, Xueliang Zhang


Keyword spotting (KWS) is a very important technique for human–machine interaction to detect a trigger phrase and voice commands. In practice, a popular demand for KWS is to conveniently define the keywords by consumers or device vendors. In this paper, we propose a novel template matching approach for KWS based on end-to-end deep learning method, which utilizes an attention mechanism to match the input voice to the keyword templates in high-level feature space. The proposed approach only requires very limited voice samples (at least only one sample) to register a new keyword without any retraining. We conduct experiments on the publicly available Google speech commands dataset. The experimental results demonstrate that our method outperforms baseline methods while allowing for a flexible configuration.


 DOI: 10.21437/Interspeech.2020-1761

Cite as: Zhang, P., Zhang, X. (2020) Deep Template Matching for Small-Footprint and Configurable Keyword Spotting. Proc. Interspeech 2020, 2572-2576, DOI: 10.21437/Interspeech.2020-1761.


@inproceedings{Zhang2020,
  author={Peng Zhang and Xueliang Zhang},
  title={{Deep Template Matching for Small-Footprint and Configurable Keyword Spotting}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2572--2576},
  doi={10.21437/Interspeech.2020-1761},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1761}
}