Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition

Ashish Mittal, Samarth Bharadwaj, Shreya Khare, Saneem Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury


Spoken intent detection has become a popular approach to interface with various smart devices with ease. However, such systems are limited to the preset list of intents-terms or commands, which restricts the quick customization of personal devices to new intents. This paper presents a few-shot spoken intent classification approach with task-agnostic representations via meta-learning paradigm. Specifically, we leverage the popular representation based meta-learning learning to build a task-agnostic representation of utterances, that then use a linear classifier for prediction. We evaluate three such approaches on our novel experimental protocol developed on two popular spoken intent classification datasets: Google Commands and the Fluent Speech Commands dataset. For a 5-shot (1-shot) classification of novel classes, the proposed framework provides an average classification accuracy of 88.6% (76.3%) on the Google Commands dataset, and 78.5% (64.2%) on the Fluent Speech Commands dataset. The performance is comparable to traditionally supervised classification models with abundant training samples.


 DOI: 10.21437/Interspeech.2020-3208

Cite as: Mittal, A., Bharadwaj, S., Khare, S., Chemmengath, S., Sankaranarayanan, K., Kingsbury, B. (2020) Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition. Proc. Interspeech 2020, 4283-4287, DOI: 10.21437/Interspeech.2020-3208.


@inproceedings{Mittal2020,
  author={Ashish Mittal and Samarth Bharadwaj and Shreya Khare and Saneem Chemmengath and Karthik Sankaranarayanan and Brian Kingsbury},
  title={{Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4283--4287},
  doi={10.21437/Interspeech.2020-3208},
  url={http://dx.doi.org/10.21437/Interspeech.2020-3208}
}