Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Language Modeling of Spontaneous Speech in a Court Context

P. E. Kenne, M. O'Kane, H. G. Pearcy

The University of Adelaide, South Australia, Australia

We model the language of the courts by using a number of statistical techniques, and compare the models. In the case of word-phrase bigram and word-phrase trigram models, an issue which arises is the choice of tokens to form the word phrase. We compare the model obtained by choosing the pair which has maximal mutual information, and the model obtained by assuming a binomial ditribution of words and using a likelihood ratio test to choose pairs. The latter model gives a greater reduction in perplexity. We also compare the two choice methods on a corpus which is not based on spoken material, and find similar results.

Full Paper

Bibliographic reference.  Kenne, P. E. / O'Kane, M. / Pearcy, H. G. (1995): "Language modeling of spontaneous speech in a court context", In EUROSPEECH-1995, 1801-1804.