In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. At the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation errors. Subsequently, in the scoring step a Maximum Entropy (ME) based technique is employed to assign perceptually motivated scores to pronunciation errors. The combination of the two steps yields an approach that measures accent based on perceptual impact of pronunciation errors, and is termed as the Perceptual WFST (P-WFST). The P-WFST is evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) algorithm.
Bibliographic reference. William, Freddy / Sangwan, Abhijeet / Hansen, John H. L. (2011): "Using human perception for automatic accent assessment", In INTERSPEECH-2011, 2509-2512.