Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Learning language Translation in Limited Domains using Finite-State Models: some Extensions and Improvements

J. M. Vilar (1), A. Marzal (2), Enrique Vidal (1)

(1) Depto. de Sist. Informaticos y Computation, Universidad Politecnica de Valencia, Camino de Vera s/n, Valencia, Spain
(2) Depto. de Informatica, Campus de Penyeta Roja, Universitat Jaume I, Castello, Spain

The Onward Subsequential Transducer Inference Algorithm (OSTIA) has been used for learning Language Translations in limited domain tasks. Although it is known to converge to the correct model when presented with enough training examples, the amount of training data can be prohibitive for large vocabularies. We address this problem by using appropriate clustering of words in both the input and output languages. Experimental results are presented which show that this approach effectively avoids dependency on the size of the vocabulary.

Full Paper

Bibliographic reference.  Vilar, J. M. / Marzal, A. / Vidal, Enrique (1995): "Learning language translation in limited domains using finite-state models: some extensions and improvements", In EUROSPEECH-1995, 1231-1234.