Speech Prosody 2006
Annotating manually the accent labels of a large speech corpus is both tedious and time-consuming. In this paper we investigate automatic accent labeling procedure by using classifiers trained from limited manually labeled data. Different methods are proposed and compared in a framework of multi-classifiers, including: a linguistic classifier, an acoustic classifier and a combined one. The linguistic classifier is first used to label POS-determined content words as accented and function words as unaccented. The corresponding labels are then used to train accented and unaccented vowel HMMs separately. The combined classifier is then used to combine the decisions of the linguistic and acoustic classifiers’ outputs to minimize labeling errors. Properly combined classifiers achieve better labeling performance than their linguistic and acoustic counterparts. The performance can be further improved when the acoustic classifier is re-trained with the whole corpus which is relabeled by the combined classifiers. The final accent labeling accuracy is improved to 94.0%. Compared with 97.2%, the self-agreement ratio of a well-trained human annotator, this accuracy is fairly satisfactory.
Bibliographic reference. Chen, Yining / Lai, Min / Chu, Min / Soong, Frank K. / Zhao, Yong / Hu, Fangyu (2006): "Automatic accent annotation with limited manually labeled data", In SP-2006, paper 112.