Speech Prosody 2004

Nara, Japan
March 23-26, 2004

A Maximum Likelihood Prosody Recognizer

Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Jennifer Cole

Department of Electrical and Computer Engineering and Department of Linguistics, University of Illinois at Urbana-Champaign, IL, USA

Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood prosody recognizer consisting of a GMM-based acoustic model that models the distribution of the phone-level acoustic-prosodic observations (pitch, duration and energy) and an ANN-based language model that models the word-level stochastic dependence between prosody and syntax. Our experiments on the Radio News Corpus show that our recognizer is able to achieve 84% pitch accent recognition accuracy and 93% intonational phrase boundary (IPB) recognition accuracy in a leave-one-speaker-out task which has exceeded previous reported results on the same corpus. The same recognizer is tested on a subset of Switchboard corpus. The accuracies are degraded but still significantly better than the chance levels.

Full Paper

Bibliographic reference.  Chen, Ken / Hasegawa-Johnson, Mark / Cohen, Aaron / Cole, Jennifer (2004): "A maximum likelihood prosody recognizer", In SP-2004, 509-512.