Efficient Voice Trigger Detection for Low Resource Hardware

Siddharth Sigtia, Rob Haynes, Hywel Richards, Erik Marchi, John Bridle

We describe the architecture of an always-on keyword spotting (KWS) system for battery-powered mobile devices used to initiate an interaction with the device. An always-available voice assistant needs a carefully designed voice keyword detector to satisfy the power and computational constraints of battery powered devices. We employ a multi-stage system that uses a low-power primary stage to decide when to run a more accurate (but more power-hungry) secondary detector. We describe a straightforward primary detector and explore variations that result in very useful reductions in computation (or increased accuracy for the same computation). By reducing the set of target labels from three to one per phone and reducing the rate at which the acoustic model is operated, the compute rate can be reduced by a factor of six while maintaining the same accuracy.

 DOI: 10.21437/Interspeech.2018-2204

Cite as: Sigtia, S., Haynes, R., Richards, H., Marchi, E., Bridle, J. (2018) Efficient Voice Trigger Detection for Low Resource Hardware. Proc. Interspeech 2018, 2092-2096, DOI: 10.21437/Interspeech.2018-2204.

  author={Siddharth Sigtia and Rob Haynes and Hywel Richards and Erik Marchi and John Bridle},
  title={Efficient Voice Trigger Detection for Low Resource Hardware},
  booktitle={Proc. Interspeech 2018},