Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting

Théodore Bluche, Thibault Gisselbrecht


In this paper, we propose a fully-neural approach to open-vocabulary keyword spotting, that allows the users to include a customizable voice interface to their device and that does not require task-specific data. We present a keyword detection neural network weighing less than 250KB, in which the topmost layer performing keyword detection is predicted by an auxiliary network, that may be run offline to generate a detector for any keyword. We show that the proposed model outperforms acoustic keyword spotting baselines by a large margin on two tasks of detecting keywords in utterances and three tasks of detecting isolated speech commands. We also propose a method to fine-tune the model when specific training data is available for some keywords, which yields a performance similar to a standard speech command neural network while keeping the ability of the model to be applied to new keywords.


 DOI: 10.21437/Interspeech.2020-1186

Cite as: Bluche, T., Gisselbrecht, T. (2020) Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting. Proc. Interspeech 2020, 2552-2556, DOI: 10.21437/Interspeech.2020-1186.


@inproceedings{Bluche2020,
  author={Théodore Bluche and Thibault Gisselbrecht},
  title={{Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2552--2556},
  doi={10.21437/Interspeech.2020-1186},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1186}
}