Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Experiments on Chinese Speech Recognition with Tonal Models and Pitch Estimation Using the Mandarin Speecon Data

Ying Sun (1,2), Daniel Willett (1), Raymond Brueckner (1), Rainer Gruhn (1), Dirk Bühler (2)

(1) Harman/Becker Automotive Systems, Germany; (2) University of Ulm, Germany

Automatic speech recognition of a tonal and syllabic language such as Chinese Mandarin poses new challenges but also offers new opportunities. We present approaches and experimental results concerning the choice of base units for acoustic modeling, pitch estimation and how to integrate pitch estimates into the modeling framework. The experimental evaluations are carried out both on rather clean headset data and on noisy and reverberant distant talking speech data. Results show that tonal base units offer a word error rate reduction of more than 30% compared to toneless base units. This holds for both phoneme models and initial-final models. The integration of pitch as an additional feature stream yields another remarkable improvement of more than 20% over the best tonal baseline system. In a two-stream modeling approach, the pitch stream distributions can be strongly tied such that the overall model size increases only very moderately.

Full Paper

Bibliographic reference.  Sun, Ying / Willett, Daniel / Brueckner, Raymond / Gruhn, Rainer / Bühler, Dirk (2006): "Experiments on Chinese speech recognition with tonal models and pitch estimation using the Mandarin speecon data", In INTERSPEECH-2006, paper 1452-Tue3A2O.6.