EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Phoneme-based Topic Spotting on the Switchboard Corpus

M. W. Theunissen (1), K. Scheffler (2), J. A. du Preez (1)

(1) University of Stellenbosch, South Africa
(2) University of Cambridge, UK

The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts amongst large volumes of speech data. In this research, two phoneme-based topic spotting systems were evaluated on the Switchboard Corpus. Experiments [1,2] on the OGI Corpus suggested that the new Stochastic Method for the Automatic Recognition of Topics (SMART) yields a large improvement over the existing Euclidean Nearest Wrong Neighbours (ENWN) algorithm, which had outperformed competing systems in evaluations [3,4]. However, the small amount of data available for these experiments meant that more rigorous testing was required. We reimplemented the algorithm to run on the larger Switchboard Corpus, and report an improvement of SMART over ENWN characterised by a 35.8% reduction in ROC (receiver operating characteristic) error area. Statistical significance was demonstrated using a modified version of the McNemar test.

Full Paper

Bibliographic reference.  Theunissen, M. W. / Scheffler, K. / Preez, J. A. du (2001): "Phoneme-based topic spotting on the switchboard corpus", In EUROSPEECH-2001, 283-286.