Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech

Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong


Identifying a speaker’s native language with his speech in a second language is useful for many human-machine voice interface applications. In this paper, we use a sub-phone-based i-vector approach to identify non-native English speakers’ native languages by their English speech input. Time delay deep neural networks (TDNN) are trained on LVCSR corpora for improving the alignment of speech utterances with their corresponding sub-phonemic “senone” sequences. The phonetic variability caused by a speaker’s native language can be better modeled with the sub-phone models than the conventional phone model based approach. Experimental results on the database released for the 2016 Interspeech ComParE Native Language challenge with 11 different L1s show that our system outperforms the best system by a large margin (87.2% UAR compared to 81.3% UAR for the best system from the 2016 ComParE challenge).


 DOI: 10.21437/Interspeech.2017-245

Cite as: Qian, Y., Evanini, K., Wang, X., Suendermann-Oeft, D., Pugh, R.A., Lange, P.L., Molloy, H.R., Soong, F.K. (2017) Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech. Proc. Interspeech 2017, 2586-2590, DOI: 10.21437/Interspeech.2017-245.


@inproceedings{Qian2017,
  author={Yao Qian and Keelan Evanini and Xinhao Wang and David Suendermann-Oeft and Robert A. Pugh and Patrick L. Lange and Hillary R. Molloy and Frank K. Soong},
  title={Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2586--2590},
  doi={10.21437/Interspeech.2017-245},
  url={http://dx.doi.org/10.21437/Interspeech.2017-245}
}