We explore pre-silence syllabic lengthening as a cue for nextspeakership prediction in spontaneous dialogue. When estimated using a transcription-mediated procedure, lengthening is shown to reduce error rates by 25% relative to majority class guessing. Lengthening should therefore be exploited by dialogue systems. With that in mind, we evaluate an automatic measure of spectral envelope change, Mel-spectral flux (MSF), and show that its speaker-independent performance is at least as good as that of the transcription-mediated measure. Modeling MSF is likely to improve turn uptake in dialogue systems, and to benefit other applications needing an estimate of durational variability in speech.
Bibliographic reference. Hjalmarsson, Anna / Laskowski, Kornel (2011): "Measuring final lengthening for speaker-change prediction", In INTERSPEECH-2011, 2065-2068.