September 22-25, 1997
We present an approach to statistical part-of-speech tagging that uses two different tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modified and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach inan experiment and show that it performs significantly better than approaches using only one tagset.
Bibliographic reference. Brants, Thorsten (1997): "Internal and external tagsets in part-of-speech tagging", In EUROSPEECH-1997, 2787-2790.