4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Predicting the Out-of-Vocabulary Rate and the Required Vocabulary Size for Speech Processing Applications

Johannes Müller, Holger Stahl, Manfred Lang

Institute for Human-Machine-Communication, Munich University of Technology, Munich, Germany

This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora, vocabulary and OOV-rate can be determined for that special constellation. Average values are calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%.

Full Paper

Bibliographic reference.  Müller, Johannes / Stahl, Holger / Lang, Manfred (1996): "Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications", In ICSLP-1996, 1922-1925.