Wake Word Detection with Alignment-Free Lattice-Free MMI

Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur


Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word; (ii) we show that the classical keyword/filler model must be supplemented with an explicit non-speech (silence) model for good performance; (iii) we present an FST-based decoder to perform online detection. We evaluate our methods on two real data sets, showing 50%–90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures, and re-validate them on a third (large) data set.


 DOI: 10.21437/Interspeech.2020-1811

Cite as: Wang, Y., Lv, H., Povey, D., Xie, L., Khudanpur, S. (2020) Wake Word Detection with Alignment-Free Lattice-Free MMI. Proc. Interspeech 2020, 4258-4262, DOI: 10.21437/Interspeech.2020-1811.


@inproceedings{Wang2020,
  author={Yiming Wang and Hang Lv and Daniel Povey and Lei Xie and Sanjeev Khudanpur},
  title={{Wake Word Detection with Alignment-Free Lattice-Free MMI}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4258--4262},
  doi={10.21437/Interspeech.2020-1811},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1811}
}