Speech Prosody 2006
This paper presents a new framework for improved large vocabulary Mandarin speech recognition using prosodic features. The prosodic information is formulated in a probabilistic model well compatible to the conventional maximum a posteriori (MAP) framework for large vocabulary speech recognition. A set of prosodic features considering the special characteristics of Mandarin Chinese is developed, and both syllable-level and prosodic-word-level prosodic models are trained with the decision tree algorithm. A two-pass recognition process is used, in which each word arc in the word graph output by the first pass is rescored in the second pass using the two prosodic models. The experiments show the reasonable improvements in recognition accuracy. This approach does NOT require a prosodic labeled training corpus, and works for the large-scale speaker-independent task.
Bibliographic reference. Huang, Jui-Ting / Lee, Lin-shan (2006): "Improved large vocabulary Mandarin speech recognition using prosodic features", In SP-2006, paper 233.