4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Word Clustering with Parallel Spoken Language Corpora

Ye-Yi Wang, John Lafferty, Alex Waibel

Carnegie Mellon University, Pittsburgh, PA, USA

In this paper we introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well-suited to machine translation tasks.

Full Paper

Bibliographic reference.  Wang, Ye-Yi / Lafferty, John / Waibel, Alex (1996): "Word clustering with parallel spoken language corpora", In ICSLP-1996, 2364-2367.