4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A Generalized LR Parser for Text-to-speech Synthesis

Per Olav Heggtveit

Telenor R&D, Kjeller, Norway

The development of a parser for a Norwegian text-to-speech system is reported. The Generalized Left Right (GLR) algorithm [1] is applied, which is a generalization of the well known LR algorithm for parsing computer languages. This paper describes briefly the GLR algorithm, the integration of a probabilistic scoring model, our implementation of the parser in C++, attribute structures, lexical interface, and the application of the parser to part-of-speech (POS) tagging for Norwegian. Applied to a small test set of about 4 000 words this method correctly tags 96 % of the known words, which is close to the performance of other POS-taggers trained on large text databases [2] [3]. 85 % of the unknown words are tagged correctly, and the probability of choosing the wrong pronunciation of a word from lexicon is less than 0.1 %.

