INTERSPEECH 2011

The similarity language model is a statistical model that makes efficient use of long distance information when possible and falls back to standard ngram language model when not. To estimate the probability distribution of a given target context, each training example of the ngram model is retrieved and its similarity to the target context is estimated. In this work, this is done by performing a string alignment and training the system to estimate the similarity of each possible alignment. Whereas in the ngram model all such examples are deemed equal, the more similar an example is to the current context, the more weight it is given in the estimation of the probability distribution. The proposed model outperforms a modified KnenerNey 4gram model.
Bibliographic reference. Gillot, Christian / Cerisara, Christophe (2011): "Similarity language model", In INTERSPEECH2011, 14571460.