4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper we introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well-suited to machine translation tasks.
Bibliographic reference. Wang, Ye-Yi / Lafferty, John / Waibel, Alex (1996): "Word clustering with parallel spoken language corpora", In ICSLP-1996, 2364-2367.