13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

Jun Ogata, Masataka Goto

National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in preparing such taskspecific language models in advance, we propose collaborative training of language models on the basis of wisdom of crowds. On our public web service for LVCSR-based spoken document retrieval PodCastle, over half a million recognition errors were corrected by anonymous users. By leveraging such corrected transcriptions, component language models for various topics can be built and dynamically mixed to generate an appropriate language model for each podcast episode in an unsupervised manner. Experimental results with Japanese podcasts showed that the mixed languages models significantly reduced the word error rate.

Index Terms: web service, LVCSR, language modeling, wisdom of crowds, error correction

Full Paper

Bibliographic reference.  Ogata, Jun / Goto, Masataka (2012): "Podcastle: collaborative training of language models on the basis of wisdom of crowds", In INTERSPEECH-2012, 2370-2373.