4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A Logistic Regression Model for Detecting Prominences

Arman Maghbouleh

ATR Interpreting Telecommunications Research Labs, Kyoto, Japan Department of Linguistics, Stanford University, Stanford, CA, USA

This paper describes the development of a model for identifying points of prominence in speech. This model can be used as a first step in intonational labeling of corpora that are used in some speech synthesis systems (Black and Taylor, 1995). The working definition of prominence is that starred ToBI accents (Silverman et al., 1992), that is, H*, L*, L*+H, L+H*, and H+!H*, are prominent. The prominence detection model developed here is based on the sums-of-products vowel duration model (van Santen, 1992). The model was trained and tested on different portions of the Boston University Radio News corpus and achieves accuracy results of 86.3% correct identification with 12.5% false detection. The results are comparable to those of previous work (Wightman and Campbell, 1995): 85.9% correct identification with 10.7% false detection. The advantage of this model is that it can be trained quickly on as few as 600 data points, reducing the need for large corpora.

Full Paper

Bibliographic reference.  Maghbouleh, Arman (1996): "A logistic regression model for detecting prominences", In ICSLP-1996, 2443-2445.