13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations

Christos Koniaris, Olov Engwall, Giampiero Salvi

Centre for Speech Technology, School of Computer Science & Communication, KTH - Royal Institute of Technology, Stockholm, Sweden

This paper expands our previous work on automatic pronunciation error detection that exploits knowledge from psychoacoustic auditory models. The new system has two additional important features, i.e., auditory and acoustic processing of the temporal cues of the speech signal, and classification feedback from a trained linear dynamic model. We also perform a pronunciation analysis by considering the task as a classification problem. Finally, we evaluate the proposed methods conducting a listening test on the same speech material and compare the judgment of the listeners and the methods. The automatic analysis based on spectro-temporal cues is shown to have the best agreement with the human evaluation, particularly with that of language teachers, and with previous plenary linguistic studies.

Index Terms: L2 pronunciation error, auditory model, linear dynamic model, distortion measure, phoneme

Full Paper

Bibliographic reference.  Koniaris, Christos / Engwall, Olov / Salvi, Giampiero (2012): "Auditory and dynamic modeling paradigms to detect L2 mispronunciations", In INTERSPEECH-2012, 899-902.