EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


The Effect of Pitch and Lexical Tone on Different Mandarin Speech Recognition Tasks

Yiu Wing Wong (1), Eric Chang (2)

(1) The Chinese University of Hong Kong, Hong Kong, China (2) Microsoft Research China, China

Tone is an important component in Mandarin speech recognition. It is necessary to recognize the five lexical tones to disambiguate between confusing words. Tone is acoustically characterized by the pitch contour. The use of pitch has been shown to be helpful in Mandarin syllable recognition. In this paper, a comprehensive set of investigations on the effect of pitch on diverse Mandarin speech recognition tasks, namely large vocabulary continuous speech recognition (LVCSR) and isolated word recognition, is reported. In this paper, various techniques to utilize pitch in acoustic modeling are examined. In particular, modeling of tone context dependence and normalization of pitch value are investigated. The experimental result shows that with the incorporation of pitch, an error reduction of 26% can be achieved in tonal syllable recognition. The same level of error reduction is attained in isolated word recognition. On the other hand, the gain from using pitch in an LVCSR task is less. The result suggests that without a language model, the use of pitch is more beneficial in Mandarin speech recognition, thus speech recognizers may be designed to dynamically make use of the pitch feature to obtain the best tradeoff between accuracy and computation.

Full Paper

Bibliographic reference.  Wong, Yiu Wing / Chang, Eric (2001): "The effect of pitch and lexical tone on different Mandarin speech recognition tasks", In EUROSPEECH-2001, 2741-2744.