End-to-End Multi-Look Keyword Spotting

Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu


The performance of keyword spotting (KWS), measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. In this paper, we propose a multi-look neural network modeling for speech enhancement which simultaneously steers to listen to multiple sampled look directions. The multi-look enhancement is then jointly trained with KWS to form an end-to-end KWS model which integrates the enhanced signals from multiple look directions and leverages an attention mechanism to dynamically tune the model’s attention to the reliable sources. We demonstrate, on our large noisy and far-field evaluation sets, that the proposed approach significantly improves the KWS performance against the baseline KWS system and a recent beamformer based multi-beam KWS system.


 DOI: 10.21437/Interspeech.2020-1521

Cite as: Yu, M., Ji, X., Wu, B., Su, D., Yu, D. (2020) End-to-End Multi-Look Keyword Spotting. Proc. Interspeech 2020, 66-70, DOI: 10.21437/Interspeech.2020-1521.


@inproceedings{Yu2020,
  author={Meng Yu and Xuan Ji and Bo Wu and Dan Su and Dong Yu},
  title={{End-to-End Multi-Look Keyword Spotting}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={66--70},
  doi={10.21437/Interspeech.2020-1521},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1521}
}