Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

New Words: Effect on Recognition Performance and Incorporation Issues

I. Lee Hetherington

Spoken Language Systems Group, Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts, USA

Previously, we demonstrated that new, out-of-vocabulary words occur in a wide variety of tasks no matter how large a system vocabulary is used, and we quantified the new-word rate for a number of tasks/corpora, showing that it is dependent on the task characteristics and vocabulary size [1]. In this paper we quantify the effects of new words on our system's accuracy and computation using carefully controlled experiments. We find that we encounter about 1.5 word errors per new word, some of which occurring in neighboring in-vocabulary words. By examining the computation required to generate a word graph, we find that even the occurrence of a single new word increases computation by nearly a factor of four on average. Finally, we examine some of the issues related to the eventual automatic incorporation of new words. For our system, we examine the relative importance of training the acoustic models, pronunciation models, and the language model on exemplars of new words in order to optimize performance. We were able to assemble a system, completely untrained on the new words, that achieved an error rate measured over a set of new words that was only 1.6 times higher than that of a fully trained system.

Full Paper

Bibliographic reference.  Hetherington, I. Lee (1995): "New words: effect on recognition performance and incorporation issues", In EUROSPEECH-1995, 1645-1648.