5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Speech, Silence, Music and Noise Classification of TV Broadcast Material

Ara Samouelian (1), Jordi Robert-Ribes (2), Mike Plumpe (3)

(1) University Of Wollongong, Australia
(2) Digital Media Information Systems, CSIRO Mathematical and Information Sciences, AustraliMike
(3) Microsoft Corporation, USA
(3) Microsoft Corporation, USA

Speech processing can be of great help for indexing and archiving TV broadcast material. Broadcasting station standards will be soon digital. There will be a huge increase in the use of speech processing techniques for maintaining the archives as well as accessing them. We present an application of information theory to the classification and automatic labelling of TV broadcast material into speech, music and noise. We use information theory to construct a decision tree from several different TV programs and then apply it to a different set of TV programs. We present classification results on training and test data sets. Frame level correct classification rate, for training data was 95.5%, while for test data it ranged from 60.4% to 84.5%, depending on TV program type. At the segment level, correct recognition rate and accuracy on train data were 100% and 95.1%, respectively while for test data the % correct ranged from 80% to 100% and %accuracy ranged from 64.7% to 100%.

Full Paper

Bibliographic reference.  Samouelian, Ara / Robert-Ribes, Jordi / Plumpe, Mike (1998): "Speech, silence, music and noise classification of TV broadcast material", In ICSLP-1998, paper 0620.