13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Sentence Detection Using Multiple Annotations

Ann Lee, James Glass

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA

In this paper, we develop a sentence boundary detection system which incorporates a prosodic model, word and preterminal-level language models, and a global sentence-length model. An important aspect of this research was the investigation of crowdsourced punctuation annotations as a source of multiple references for evaluation purposes. In order to evaluate the system we propose a BLUE-like metric which compares a hypothesis to multiple references. Experiments on both transcription and ASR output show that the global sentence length model can improve the performance by 7.2% on reference transcripts and 3.8% on ASR output.

Index Terms: sentence boundary detection, prosody, finite-state transducer, amazon mechanical turk

Full Paper

Bibliographic reference.  Lee, Ann / Glass, James (2012): "Sentence detection using multiple annotations", In INTERSPEECH-2012, 1848-1851.