Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Automatic Clustering of Words for Probabilistic Language Models

Loreia Moisa (1), Egidio Giachin (2)

(1) Politechnic University of Bucharest, Bucuresti, Romania
(2) CSELT - Centro Studi e Laboratori Telecomunicazioni, Torino, Italy

In this work we compare different methods for clustering words into equivalence classes within a bigram language model, for a specific-domain recognition task (train timetable enquiry). Though the perplexity values obtained by the various methods differ, the word error rates eventually achieved are very similar. We examine this behavior in the light of the word usage peculiarities present in these types of tasks.

Full Paper

Bibliographic reference.  Moisa, Loreia / Giachin, Egidio (1995): "Automatic clustering of words for probabilistic language models", In EUROSPEECH-1995, 1249-1253.