Speech Prosody 2012
To develop a German computer assisted language learning (CALL) system for students whose mother's tongues are syllable- or mora-timed, a multi-stage feature normalization scheme which takes both word stress and sentence intonation patterns into consideration is proposed for German stressed/unstressed syllable classification. The main idea is to first apply Fujisaki model and band-pass filtering to pitch and energy contours, respectively, to remove the undesired sentence intonation component and sequentially normalize the extracted features in syllable- and supra-segment-level. Comparing with traditional Z-Score feature normalization baseline, the proposed method achieved lower classification error rate (27.04% vs. 31.34%) on The Kiel Corpus of Read Speech, Vol. I database. Besides, by integrating decision tree-based feature selection and long-span contextual prosodic cues, the system performance was further improved to 24.68%.
Index Terms: prosodic feature normalization, German stressed/unstressed syllable classification, Fujisaki model
Bibliographic reference. Liao, Yuan-Fu / Chen, Yan-Ting / Huang, Jhen-Lun (2012): "Multi-stage feature normalization for robust German stressed/unstressed syllable classification", In SP-2012, 210-213.