This paper introduces KPCatcher (keyphrase catcher). The value of our work lies in providing concrete solutions to building a real keyphrase extraction product for enterprise videos. KPCatcher has been designed to robustly extract a ranked list of keyphrases from enterprise videos, independent of the domain. It treats noun phrases in the transcript as candidate keyphrases and scores them by aggregating word-level scores. By using confidence-based and counting-based rules, KPCatcher handles transcription errors to prevent incorrect keyphrases to be surfaced to end users. Different from previous work, we focus our experiments on automatic transcriptions of real enterprise videos from various domains. We thoroughly evaluate several well-known keyword ranking features and the denoising rules, using enterprise videos from several domains at various word error rates. We find term frequency to be the best feature and show that our denoising rules are very effective in both rejecting incorrect keyphrases and increasing the overlap between top keyphrases and human provided keyphrases. We also show that KPCatcher compares favorably to existing research systems on ICSI meeting data.
Bibliographic reference. Xi, Yongxin Taylor / Paulik, Matthias / Gadde, Venkata Ramana / Sankar, Ananth (2013): "KPCatcher — a keyphrase extraction system for enterprise videos", In INTERSPEECH-2013, 1906-1910.