Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Prosody-Dependent Acoustic Modeling for Mandarin Speech Recognition

Tzu-Hsuan Chiu (1), Chen-Yu Chiang (1), Yuan-Fu Liao (2), Jyh-Her Yang (1), Yih-Ru Wang (1), Sin-Horng Chen (1)

(1) Department of Electrical Engineering, National Chiao Tung University, Taiwan
(2) Department of Electronic Engineering, National Taipei University of Technology, Taiwan

A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and tightly-coupled non-break. In the training phase, break labeling is automatically accomplished by a Prosody Labeling and Modeling algorithm proposed previously. Then, prosody- and phonetic-dependent phone models are constructed by a standard decision tree-based context clustering of HMMs. The effectiveness of the new AM was examined on a Mandarin syllable recognition task. Experimental results showed that the new approach outperformed the conventional CD-AM on achieving better syllable recognition rate as well as on obtaining a more efficient syllable lattice with better compromise on complexity verse syllable coverage rate.

Index Terms: acoustic modeling, speech recognition, prosody-dependent acoustic model, prosodic break

Full Paper

Bibliographic reference.  Chiu, Tzu-Hsuan / Chiang, Chen-Yu / Liao, Yuan-Fu / Yang, Jyh-Her / Wang, Yih-Ru / Chen, Sin-Horng (2012): "Prosody-dependent acoustic modeling for Mandarin speech recognition", In SP-2012, 139-142.