This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large corpora (38 million words and more) can be clustered within a small number of days or even hours. 3) We extend the clustering approach from bigrams to trigrams. 4) We present experimental results on a 38 million word training corpus.
Bibliographic reference. Martin, Sven / Liermann, Jörg / Ney, Hermann (1995): "Algorithms for bigram and trigram word clustering", In EUROSPEECH-1995, 1253-1256.