ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Discriminative pronunciation modeling based on minimum phone error training

Meixu Song, Qingqing Zhang, Jielin Pan, Yonghong Yan

Introducing pronunciation models into decoding has proven beneficial for LVCSR. As Minimum Phone Error (MPE) training has almost become a standard scheme for acoustic modeling, a discriminative pronunciation modeling method is investigated under the framework of MPE training. In order to bring the pronunciation models into MPE training, the auxiliary function of MPE training is rewritten at word level, and decomposes into two parts. One is for co-training the acoustic models, and the other is for discriminatively training the pronunciation models. On Mandarin conversational telephone speech recognition task, compared to the baseline using a canonical lexicon, the discriminative pronunciation models reduced the absolute Character Error Rate (CER) by 0.7% on LDC test set, and with the acoustic model co-training, about 1% additional CER decrease had been achieved.

doi: 10.21437/Interspeech.2013-463

Cite as: Song, M., Zhang, Q., Pan, J., Yan, Y. (2013) Discriminative pronunciation modeling based on minimum phone error training. Proc. Interspeech 2013, 1941-1945, doi: 10.21437/Interspeech.2013-463

  author={Meixu Song and Qingqing Zhang and Jielin Pan and Yonghong Yan},
  title={{Discriminative pronunciation modeling based on minimum phone error training}},
  booktitle={Proc. Interspeech 2013},