ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Automatic estimation of dialect mixing ratio for dialect speech recognition

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been reported to be effective to train a language model using a dialect-mixed corpus. One reason behind this is geographical continuity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accuracy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We handle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

doi: 10.21437/Interspeech.2013-386

Cite as: Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., Okuno, H.G. (2013) Automatic estimation of dialect mixing ratio for dialect speech recognition. Proc. Interspeech 2013, 1492-1496, doi: 10.21437/Interspeech.2013-386

  author={Naoki Hirayama and Koichiro Yoshino and Katsutoshi Itoyama and Shinsuke Mori and Hiroshi G. Okuno},
  title={{Automatic estimation of dialect mixing ratio for dialect speech recognition}},
  booktitle={Proc. Interspeech 2013},