4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Automatic Linguistic Segmentation of Conversational Speech

Andreas Stolcke, Elizabeth Shriberg

Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA

As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level features for segmentation performance. Using only word-level information, we achieve 85% recall and 70% precision on linguistic boundary detection.

Full Paper

Bibliographic reference.  Stolcke, Andreas / Shriberg, Elizabeth (1996): "Automatic linguistic segmentation of conversational speech", In ICSLP-1996, 1005-1008.