Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Improved Tone Modeling for Mandarin Broadcast News Speech Recognition

Xin Lei (1), Manhung Siu (2), Mei-Yuh Hwang (1), Mari Ostendorf (1), Tan Lee (3)

(1) University of Washington, USA; (2) Hong Kong University of Science & Technology, China; (3) Chinese University of Hong Kong, China

Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F0 features are appended to the spectral feature vector. In this paper, we combine the embedded approach (using improved F0 smoothing) with explicit tone modeling in rescoring the output lattices. Oracle experiments indicate 32% relative improvement can be achieved by rescoring with perfect tone information. Recognition experiments on Mandarin broadcast news show that, even with an accuracy of only 70%, the explicit tone classifier offers complementary knowledge and improves performance significantly. Through the combination of tone modeling techniques, the character error rate on the CTV test set can be improved from 13.0% to 11.5%.

Full Paper

Bibliographic reference.  Lei, Xin / Siu, Manhung / Hwang, Mei-Yuh / Ostendorf, Mari / Lee, Tan (2006): "Improved tone modeling for Mandarin broadcast news speech recognition", In INTERSPEECH-2006, paper 1752-Tue3A2O.4.