Speech Prosody 2010
Chicago, IL, USA
We consider methods for training a prosodic classifier using labeled training data from a different genre than the one on which the system will be deployed. Two binary tasks are considered: word-level pitch accent and phrase boundary detection. Using radio news and conversational telephone speech, we consider cross-genre training using acoustic and textual features, and find that acoustic features transfer better than text features in most cases. We also find that a single classifier trained from both genres nearly matches genre-dependent performance. We then consider some simple unsupervised domain adaptation approaches, including class proportion adjustment, sample selection bias correction, and feature normalization. With the exception of class proportion adjustment, which is slightly helpful in one case but proves unstable, none of the approaches improve cross-genre performance over the baseline.
Index Terms: prosody recognition, domain adaptation
Bibliographic reference. Margolis, Anna / Ostendorf, Mari / Livescu, Karen (2010): "Cross-genre training for automatic prosody classification", In SP-2010, paper 113.