EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Deriving Document Structure from Prosodic Cues

Martin Haase (1), Werner Kriechbaum (2), Gregor Möhler (1), Gerhard Stenzel (2)

(1) Universität Stuttgart, Germany
(2) IBM Deutschland Entwicklung, Germany

This study presents an approach for prosody-driven segmentation of speech data. The model is based solely on F0 contours and RMS envelopes. Phoneme or word information from a speech recognizer is unneccesary. Using data from German broadcast news, we show how this prosodic information can be exploited to retrieve structural information of the spoken text. The suitability of the CART-like algorithm for utterance boundary prediction has been evaluated on 7 five-minutes-news- reports, using 28 reports as training material for the classification tree. Sentence boundaries were predicted with a precision of 93%, at a recall of 88%.

Full Paper

Bibliographic reference.  Haase, Martin / Kriechbaum, Werner / Möhler, Gregor / Stenzel, Gerhard (2001): "Deriving document structure from prosodic cues", In EUROSPEECH-2001, 2157-2160.