ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Discriminative training of WFST factors with application to pronunciation modeling

Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu

One of the most popular speech recognition architectures consists of multiple components (like the acoustic, pronunciation and language models) that are modeled as weighted finite state transducer (WFST) factors in a cascade. These factor WFSTs are typically trained in isolation and combined efficiently for decoding. Recent work has explored jointly estimating parameters for these models using considerable amounts of training data. We propose an alternative approach to selectively train factor WFSTs in such an architecture, while still leveraging information from the entire cascade. This technique allows us to effectively estimate parameters of a factor WFST using relatively small amounts of data, if the factor is small. Our approach involves an online training paradigm for linear models adapted for discriminatively training one or more WFSTs in a cascade. We apply this method to train a pronunciation model for recognition on conversational speech, resulting in significant improvements in recognition performance over the baseline model.

doi: 10.21437/Interspeech.2013-467

Cite as: Jyothi, P., Fosler-Lussier, E., Livescu, K. (2013) Discriminative training of WFST factors with application to pronunciation modeling. Proc. Interspeech 2013, 1961-1965, doi: 10.21437/Interspeech.2013-467

  author={Preethi Jyothi and Eric Fosler-Lussier and Karen Livescu},
  title={{Discriminative training of WFST factors with application to pronunciation modeling}},
  booktitle={Proc. Interspeech 2013},