Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios

Ankur Kumar, Sachin Singh, Dhananjaya Gowda, Abhinav Garg, Shatrughan Singh, Chanwoo Kim


In this paper, we present techniques to compute confidence score on the predictions made by an end-to-end speech recognition model. Our proposed neural confidence measure (NCM) is trained as a binary classification task to accept or reject an end-to-end speech recognition result. We incorporate features from an encoder, a decoder, and an attention block of the attention-based end-to-end speech recognition model to improve NCM significantly. We observe that using information from multiple beams further improves the performance. As a case study of this NCM, we consider an application of the utterance-level confidence score in a distributed speech recognition environment with two or more speech recognition systems running on different platforms with varying resource capabilities. We show that around 57% computation on a resource-rich high-end platform (e.g. a cloud platform) can be saved without sacrificing accuracy compared to the high-end only solution. Around 70–80% of computations can be saved if we allow a degradation of word error rates to within 5–10% relative to the high-end solution.


 DOI: 10.21437/Interspeech.2020-3216

Cite as: Kumar, A., Singh, S., Gowda, D., Garg, A., Singh, S., Kim, C. (2020) Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios. Proc. Interspeech 2020, 4357-4361, DOI: 10.21437/Interspeech.2020-3216.


@inproceedings{Kumar2020,
  author={Ankur Kumar and Sachin Singh and Dhananjaya Gowda and Abhinav Garg and Shatrughan Singh and Chanwoo Kim},
  title={{Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4357--4361},
  doi={10.21437/Interspeech.2020-3216},
  url={http://dx.doi.org/10.21437/Interspeech.2020-3216}
}