This paper aims to compare transcription agreement on non-native English speech corpus spoken by Korean learners between native and non-native annotators. Ten non-native annotators and three native annotators participate in the transcription of 608 sentences. All annotators are provided with forced-aligned phone sequences, which are to be corrected in case when they are realized differently. The transcription agreement is calculated by counting the number of identically labeled phones for all pairs of annotators. The overall transcription agreement as well as categorical transcription agreement is measured among non-native annotators, among native annotators, and among both native and non-native annotators. As a result, the transcription agreement for the three groups is 88.35%, 88.76%, and 87.83% respectively. Furthermore, vowels show 84.43% among non-natives, 88.38% among natives, and 85.75% between non-natives and natives, whereas consonants show 89.20%, 88.82%, and 88.34%, respectively. In sum, the results indicate that the transcription performed by non-native annotators is close to that performed by native annotators with respect to transcription agreement.
Index Terms: non-native speech, corpus, transcription agreement, non-native annotator, native annotator
Bibliographic reference. Ryu, Hyuksu / Kim, Sunhee / Chung, Minhwa (2012): "Comparing transcription agreement on non-native English speech corpus between native and non-native annotators", In INTERSPEECH-2012, 2366-2369.