5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Same News is Good News: Automatically Collecting Reoccurring Radio News Stories

Stefan Rapp (1), Grzegorz Dogil (2)

(1) Sony International (Europe) GmbH, Germany
(2) Institut f. Maschinelle Sprachverarbeitung, Univ. Stuttgart, Germany

We present methods for finding same or almost same news stories in the hourly radio news broadcasts spoken by the same or different announcers. They allow to establish a large database of repeated and professionally read speech at low costs that is especially interesting for prosody research, but also, e.g., for concept-to-speech and socio-linguistic studies. An automatically recorded complete radio news broadcast is first segmented into individual news stories using HMM recognition. Then, the word sequence estimates of the stories are either compared directly (naive method) or realigned with the signal of other stories (realignment method) to find out which stories were read before and which not. Both methods can be further improved by computing ``meta distances'' that also take into account distances to other stories. We find that the realignment method combined with meta distances is the most reliable of the methods on real life data.

Full Paper

Bibliographic reference.  Rapp, Stefan / Dogil, Grzegorz (1998): "Same news is good news: automatically collecting reoccurring radio news stories", In ICSLP-1998, paper 0906.