Shallow-Fusion End-to-End Contextual Biasing

Ding Zhao, Tara N. Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang

Contextual biasing to a specific domain, including a user’s song names, app names and contact names, is an important component of any production-level automatic speech recognition (ASR) system. Contextual biasing is particularly challenging in end-to-end models because these models keep a small list of candidates during beam search, and also do poorly on proper nouns, which is the main source of biasing phrases. In this paper, we present various algorithmic and training improvements to shallow-fusion-based biasing for end-to-end models. We will show that the proposed approach obtains better performance than a state-of-the-art conventional model across a variety of tasks, the first time this has been demonstrated.

 DOI: 10.21437/Interspeech.2019-1209

Cite as: Zhao, D., Sainath, T.N., Rybach, D., Rondon, P., Bhatia, D., Li, B., Pang, R. (2019) Shallow-Fusion End-to-End Contextual Biasing. Proc. Interspeech 2019, 1418-1422, DOI: 10.21437/Interspeech.2019-1209.

  author={Ding Zhao and Tara N. Sainath and David Rybach and Pat Rondon and Deepti Bhatia and Bo Li and Ruoming Pang},
  title={{Shallow-Fusion End-to-End Contextual Biasing}},
  booktitle={Proc. Interspeech 2019},