Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning

Longfei Yang, Kaiqi Fu, Jinsong Zhang, Takahiro Shinozaki

Pronunciation erroneous tendencies (PETs) are designed to provide instructive feedback to guide non-native language learners to correct their pronunciation errors in language learning thus PET detection plays an important role in computer-aided pronunciation training (CAPT) system. However, PET detection suffers data sparsity problem because non-native data collection and annotation are time-consuming tasks. In this paper, we propose an unsupervised learning framework based on contrastive predictive coding (CPC) to extract knowledge from a large scale of unlabeled speech from two native languages, and then transfer this knowledge to the PET detection task. In this framework, language adversarial training is incorporated to guide the model to align the feature distribution between two languages. In addition, sinc filter is introduced to extract formant-like feature that is considered relevant to some kinds of pronunciation errors. Through the experiment on the Japanese part of BLCU inter-Chinese speech corpus, results show that our proposed language adversarial represent learning is effective to improve the performance of pronunciation erroneous tendency detection for non-native language learners.

 DOI: 10.21437/Interspeech.2020-2033

Cite as: Yang, L., Fu, K., Zhang, J., Shinozaki, T. (2020) Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning. Proc. Interspeech 2020, 3042-3046, DOI: 10.21437/Interspeech.2020-2033.

  author={Longfei Yang and Kaiqi Fu and Jinsong Zhang and Takahiro Shinozaki},
  title={{Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning}},
  booktitle={Proc. Interspeech 2020},