Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Multi-Stage Feature Normalization for Robust German Stressed/Unstressed Syllable Classification

Yuan-Fu Liao, Yan-Ting Chen, Jhen-Lun Huang

Institute of Computer and Communication Engineering, National Taipei University of Technology, Taiwan

To develop a German computer assisted language learning (CALL) system for students whose mother's tongues are syllable- or mora-timed, a multi-stage feature normalization scheme which takes both word stress and sentence intonation patterns into consideration is proposed for German stressed/unstressed syllable classification. The main idea is to first apply Fujisaki model and band-pass filtering to pitch and energy contours, respectively, to remove the undesired sentence intonation component and sequentially normalize the extracted features in syllable- and supra-segment-level. Comparing with traditional Z-Score feature normalization baseline, the proposed method achieved lower classification error rate (27.04% vs. 31.34%) on “The Kiel Corpus of Read Speech, Vol. I” database. Besides, by integrating decision tree-based feature selection and long-span contextual prosodic cues, the system performance was further improved to 24.68%.

Index Terms: prosodic feature normalization, German stressed/unstressed syllable classification, Fujisaki model

Full Paper

Bibliographic reference.  Liao, Yuan-Fu / Chen, Yan-Ting / Huang, Jhen-Lun (2012): "Multi-stage feature normalization for robust German stressed/unstressed syllable classification", In SP-2012, 210-213.