13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

On-the-fly Topic Adaptation for YouTube Video Transcription

Kapil Thadani (1), Fadi Biadsy (2), Dan Bikel (2)

(1) Department of Computer Science, Columbia University, New York, NY, USA
(2) Google Inc., New York, NY, USA

Automatic closed-captioning of video is a useful application of speech recognition technology but poses numerous challenges when applied to open-domain user-uploaded videos such as those on YouTube. In this work, we explore a strategy to improve decoding accuracy for video transcription by decoding each video with a language model (LM) adapted specifically to the topics that the video covers. Taxonomic topic classifiers are used to determine the topic content of videos and to build a large set of topic-specific LMs from web documents. We consider strategies for selecting and interpolating LMs in both supervised and unsupervised scenarios in a two-pass lattice rescoring framework. Experiments on a YouTube video corpus show a 10% relative reduction in WER over generic single-pass transcriptions as well as a statistically significant 2.5% reduction over rescoring with a very large non-adapted LM built from all the documents.

Full Paper

Bibliographic reference.  Thadani, Kapil / Biadsy, Fadi / Bikel, Dan (2012): "On-the-fly topic adaptation for YouTube video transcription", In INTERSPEECH-2012, 210-213.