Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Detection of Voice Onset Time Contrasts for Use in Pronunciation Assessment

Abe Kazemzadeh (1), Joseph Tepperman (1), Jorge Silva (1), Hong You (2), Sungbok Lee (1), Abeer Alwan (2), Shrikanth Narayanan (1)

(1) University of Southern California, USA; (2) University of California at Los Angeles, USA

This study examines methods for recognizing different classes of phones from accented speech based on voice onset time (VOT). These methods are tested on data from the Tball corpus of Los Angeles-area elementary school children [1]. The methods proposed and tested are: 1) to train models based on standard English VOT contrasts and then extract the VOT characteristics of the phones by measuring the duration of phone-level and sub-phone-level alignments, 2) to train phone models with explicit aspiration, and 3) to train different models for different phoneme classes of VOT times. Error rates of 23-53% for different phone classes are reported for the first method, 5-57% for the second method, and 0-36% for the third. The results show that different methods work better on different phone classes. We interpret these results in relation to past research on VOT, explain possible uses for these findings, and propose directions for future research.

Full Paper

Bibliographic reference.  Kazemzadeh, Abe / Tepperman, Joseph / Silva, Jorge / You, Hong / Lee, Sungbok / Alwan, Abeer / Narayanan, Shrikanth (2006): "Automatic detection of voice onset time contrasts for use in pronunciation assessment", In INTERSPEECH-2006, paper 1884-Mon3FoP.8.