ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Context-dependent phone mapping for LVCSR of under-resourced languages

Van Hai Do, Xiong Xiao, Eng Siong Chng, Haizhou Li

This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. likelihood or posterior scores. In this paper, we use deep neural networks trained with a lot of Malay training data to generate bottleneck and posterior features for the target English acoustic models. We extend the concept of phone mapping by using not only posteriors but also bottleneck feature as the input for phone mapping. Experiments show that the phone mapping technique outperforms the cross-lingual tandem approach significantly. In addition, we also show that bottleneck and posterior features contain complementary information. A consistent improvement is obtained by combining these two feature streams to form the input for phone mapping.

doi: 10.21437/Interspeech.2013-143

Cite as: Do, V.H., Xiao, X., Chng, E.S., Li, H. (2013) Context-dependent phone mapping for LVCSR of under-resourced languages. Proc. Interspeech 2013, 500-504, doi: 10.21437/Interspeech.2013-143

  author={Van Hai Do and Xiong Xiao and Eng Siong Chng and Haizhou Li},
  title={{Context-dependent phone mapping for LVCSR of under-resourced languages}},
  booktitle={Proc. Interspeech 2013},