Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


A Technique to Automatically Assign Parts-of-Speech to Words Taking into Account Word-Ending Information Through a Probabilistic Model

Giulio Maltese, Federico Mancini

IBM Semea Scientific Center, Roma, Italy

A system to automatically tag arbitrary text with the part-of-speech of each word is described. The system is based on a probabilistic model where we assume that words in a given sequence are the output symbols of a Hidden Markov Model, the states of which are represented by pairs of parts-of-speech. Using a 17 tag set the rate of correctly tagged words ranged from 96. 2% to 97. 2% on various texts. The system proved to be quite effective even using a small set of initial statistics. As to words never occurred in training data, we employed a statistical technique based on word-endings frequencies. This technique resulted in a 22% decrease in tagging error rate using a 260,000-word reference vocabulary and in a 49% decrease making use of a 20,000-word vocabulary.

Full Paper

Bibliographic reference.  Maltese, Giulio / Mancini, Federico (1991): "A technique to automatically assign parts-of-speech to words taking into account word-ending information through a probabilistic model", In EUROSPEECH-1991, 753-756.