4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper describes the development of a model for identifying points of prominence in speech. This model can be used as a first step in intonational labeling of corpora that are used in some speech synthesis systems (Black and Taylor, 1995). The working definition of prominence is that starred ToBI accents (Silverman et al., 1992), that is, H*, L*, L*+H, L+H*, and H+!H*, are prominent. The prominence detection model developed here is based on the sums-of-products vowel duration model (van Santen, 1992). The model was trained and tested on different portions of the Boston University Radio News corpus and achieves accuracy results of 86.3% correct identification with 12.5% false detection. The results are comparable to those of previous work (Wightman and Campbell, 1995): 85.9% correct identification with 10.7% false detection. The advantage of this model is that it can be trained quickly on as few as 600 data points, reducing the need for large corpora.
Bibliographic reference. Maghbouleh, Arman (1996): "A logistic regression model for detecting prominences", In ICSLP-1996, 2443-2445.