5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Internal and External Tagsets in Part-of-Speech Tagging

Thorsten Brants

Universität des Saarlandes, Computational Linguistics, Saarbrücken, Germany

We present an approach to statistical part-of-speech tagging that uses two different tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modified and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach inan experiment and show that it performs significantly better than approaches using only one tagset.

Full Paper

Bibliographic reference.  Brants, Thorsten (1997): "Internal and external tagsets in part-of-speech tagging", In EUROSPEECH-1997, 2787-2790.