4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Word Predictability After Hesitations: A Corpus-based Study

Elizabeth Shriberg, Andreas Stolcke

Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA

We ask whether lexical hesitations in spontaneous speech tend to precede words that are difficult to predict. We define predictability in terms of both transition probability and entropy, in the context of an N-gram language model. Results show that transition probability is significantly lower at hesitation transitions, and that this is attributable to both the following word and the word history. In addition, results suggest that fluent transitions in sentences with a hesitation elsewhere are significantly more likely than transitions in fluent sentences to contain out-of-vocabulary words and novel word combinations. Such findings could be used to improve statistical language modeling for spontaneous-speech applications.

Full Paper

Bibliographic reference.  Shriberg, Elizabeth / Stolcke, Andreas (1996): "Word predictability after hesitations: a corpus-based study", In ICSLP-1996, 1868-1871.