5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Language Modeling Based on a Hierarchical Approach: M_n^v

Imed Zitouni

LORIA / INRIA-Lorraine, France

In contrast to conventional n-gram approches, which are the most used language model in continuous speech recognition system, the multigram approach models a stream of variable-length sequences. To overcome the independence assumption in classical multigram, we propose in this paper a hierarchical model which successively relaxes this assumption. We called this model: Mnv. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data used at different levels (j in 1...v). We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm. A few experimental tests were carried out on a corpus extracted from the French ``Le Monde''. Results show that Mnv outperforms based multigram and interpolated bigram but are comparable to the interpolated trigram model.

Full Paper

Bibliographic reference.  Zitouni, Imed (1998): "A language modeling based on a hierarchical approach: m_n^v", In ICSLP-1998, paper 0727.