EUROSPEECH 2001 Scandinavia
The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts amongst large volumes of speech data. In this research, two phoneme-based topic spotting systems were evaluated on the Switchboard Corpus. Experiments [1,2] on the OGI Corpus suggested that the new Stochastic Method for the Automatic Recognition of Topics (SMART) yields a large improvement over the existing Euclidean Nearest Wrong Neighbours (ENWN) algorithm, which had outperformed competing systems in evaluations [3,4]. However, the small amount of data available for these experiments meant that more rigorous testing was required. We reimplemented the algorithm to run on the larger Switchboard Corpus, and report an improvement of SMART over ENWN characterised by a 35.8% reduction in ROC (receiver operating characteristic) error area. Statistical significance was demonstrated using a modified version of the McNemar test.
Bibliographic reference. Theunissen, M. W. / Scheffler, K. / Preez, J. A. du (2001): "Phoneme-based topic spotting on the switchboard corpus", In EUROSPEECH-2001, 283-286.