Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition

Abhinav Garg, Ashutosh Gupta, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim


In this paper, we propose a hierarchical multi-stage word-to-grapheme Named Entity Correction (NEC) algorithm. Conventional NEC algorithms use a single-stage grapheme or phoneme level edit distance to search and replace Named Entities (NEs) misrecognized by a speech recognizer. However, longer named entities like song titles cannot be easily handled by such a single stage correction. We propose a three-stage NEC, starting with a word-level matching, followed by a phonetic double metaphone based matching, and a final grapheme level candidate selection. We also propose a novel NE Rejection mechanism which is important to ensure that the NEC does not replace correctly recognized NEs with unintended but similar named entities. We evaluate our solution on two different test sets from the call and music domains, for both server as well as on-device speech recognition configurations. For the on-device model, our NEC outperforms an n-gram fusion when employed standalone. Our NEC reduces the word error rate by 14% and 63% relatively for music and call, respectively, when used after an n-gram based biasing language model. The average latency of our NEC is under 3 ms per input sentence while using only ~1 MB for an input NE list of 20,000 entries.


 DOI: 10.21437/Interspeech.2020-3174

Cite as: Garg, A., Gupta, A., Gowda, D., Singh, S., Kim, C. (2020) Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition. Proc. Interspeech 2020, 1793-1797, DOI: 10.21437/Interspeech.2020-3174.


@inproceedings{Garg2020,
  author={Abhinav Garg and Ashutosh Gupta and Dhananjaya Gowda and Shatrughan Singh and Chanwoo Kim},
  title={{Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1793--1797},
  doi={10.21437/Interspeech.2020-3174},
  url={http://dx.doi.org/10.21437/Interspeech.2020-3174}
}