Clustered language models have the advantage of requiring less training data than n-grams, but they may perform worse if the training corpus is "sufficiently" large to train the n-gram well. How do the performance of a clustered language model and a n-gram compare on the Wall Street Journal corpus? While trying to address this question, we develop the following two ideas. First an existing algorithm is extended to deal with higher order n-grams. Second, a heuristic to speed up the algorithm is presented. The resulting algorithm is used to cluster bi- and trigrams on the Wall Street Journal corpus and the language models it produces can compete with existing backoff models. Especially when there is only little training data available, the clustered models outperform the back-off models. This is important for practical recognition systems, where it is not always possible to obtain several million words for a given domain.
Bibliographic reference. Ueberla, Joerg P. (1995): "More efficient clustering of n-grams for statistical language modeling", In EUROSPEECH-1995, 1257-1260.