Speech Prosody 2010
Chicago, IL, USA
This work explores prosodic/acoustic cues for improving a baseline phone segmentation module. The baseline version is provided by a large vocabulary continuous speech recognition system. An analysis of the baseline results revealed problems in word boundary detection, that we tried to solve by using postprocessing rules based on prosodic features (pitch, energy and duration). These rules achieved better results in terms of interword pause detection, durations of silent pauses previously detected, and also durations of phones at initial and final sentencelike unit level. These improvements may be relevant not only for retraining acoustic models, but also for the automatic punctuation task. These two tasks were evaluated. Results based on more reliable boundaries are promising. This work allows us to tackle more challenging problems, combining prosodic and lexical features for the identification of sentence-like units.
Index Terms: prosody, automatic phone segmentation, punctuation.
Bibliographic reference. Moniz, Helena / Batista, Fernando / Meinedo, Hugo / Abad, Alberto / Trancoso, Isabel / Mata, Ana Isabel / Mamede, Nuno (2010): "Prosodically-based automatic segmentation and punctuation", In SP-2010, paper 910.