Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese

Antje Schweitzer, Ngoc Thang Vu

We investigate tone recognition in Vietnamese across gender and dialects. In addition to well-known parameters such as single fundamental frequency (F0) values and energy features, we explore the impact of harmonicity on recognition accuracy, as well as that of the PaIntE parameters, which quantify the shape of the F0 contour over complete syllables instead of providing more local single values. Using these new features for tone recognition in the GlobalPhone database, we observe significant improvements of approx. 1% in recognition accuracy when adding harmonicity, and of another approx. 4% when adding the PaIntE parameters. Furthermore, we analyze the influence of gender and dialect on recognition accuracy. The results show that it is easier to recognize tones for female than for male speakers, and easier for the Northern dialect than for the Southern dialect. Moreover, we achieve reasonable results testing models across gender, while the performance drops strongly when testing across dialects.

DOI: 10.21437/Interspeech.2016-405

Cite as

Schweitzer, A., Vu, N.T. (2016) Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese. Proc. Interspeech 2016, 1064-1068.

author={Antje Schweitzer and Ngoc Thang Vu},
title={Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese},
booktitle={Interspeech 2016},