Building a Robust Word-Level Wakeword Verification Network

Rajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis

Wakeword detection is responsible for switching on downstream systems in a voice-activated device. To prevent a response when the wakeword is detected by mistake, a secondary network is often utilized to verify the detected wakeword. Published verification approaches are formulated based on Automatic Speech Recognition (ASR) biased towards the wakeword. This approach has several drawbacks, including high model complexity and the necessity of large vocabulary training data. To address these shortcomings, we propose to use a large receptive field (LRF) word-level wakeword model, and in particular, a convolutional-recurrent-attention (CRA) network. CRA networks use a strided small receptive field convolutional front-end followed by fixed time-step recurrent layers optimized to model the temporal phonetic dependencies within the wakeword. We experimentally show that this type of modeling helps the system to be robust to errors in the location of the wakeword as estimated by the detection network. The proposed CRA network significantly outperforms previous baselines, including an LRF whole-word convolutional network and a 2-stage DNN-HMM system. Additionally, we study the importance of pre- and post-wakeword context. Finally, the CRA network has significantly fewer model parameters and multiplies, which makes it suitable for real-world production applications.

 DOI: 10.21437/Interspeech.2020-2018

Cite as: Kumar, R., Rodehorst, M., Wang, J., Gu, J., Kulis, B. (2020) Building a Robust Word-Level Wakeword Verification Network. Proc. Interspeech 2020, 1972-1976, DOI: 10.21437/Interspeech.2020-2018.

  author={Rajath Kumar and Mike Rodehorst and Joe Wang and Jiacheng Gu and Brian Kulis},
  title={{Building a Robust Word-Level Wakeword Verification Network}},
  booktitle={Proc. Interspeech 2020},