EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Representation of Large Lexica Using Finite-State Transducers for the Multilingual Text-to-Speech Synthesis Systems

Matej Rojc, Zdravko Kacic

University of Maribor, Slovenia

Large external language resources used for multilingual text processing in TTS systems represent a big problem because of needed space and slow look-up time. Representation of large lexica using finite-state transducers is mainly motivated by considerations of space and time efficiency. In the paper we present a method and results of compiling large German phonetic and morphology lexica (CISLEX) [4] into corresponding finite-state transducers (FSTs), both with about 300.000 words. For both lexica a great reduction in size and optimal access time was achieved. The starting size for German phonetic lexicon was 12.526 MB and 18.49 MB for morphology lexicon. The final size of the corresponding FST was only 2.78 MB for the phonetic lexicon and 6.33 MB for the morphology lexicon. At the same time the look-up time is optimal, since it depends only on the length of the input word and not on the size of the lexicon.

Full Paper

Bibliographic reference.  Rojc, Matej / Kacic, Zdravko (2001): "Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems", In EUROSPEECH-2001, 2251-2254.