12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Dual Channel Coupled Decoder for Fillers and Feedback

Daniel Neiberg, Joakim Gustafson

KTH, Sweden

This study presents a dual channel decoder capable of modeling cross-speaker dependencies for segmentation and classification of fillers and feedbacks in conversational speech found in the DEAL corpus. For the same number of Gaussians per state, we have shown improvement in terms of average F-score for the successive addition of 1) increased frame rate from 10 ms to 50 ms 2) Joint Maximum Cross-Correlation (JMXC) features in a single channel decoder 3) a joint transition matrix which captures dependencies symmetrically across the two channels 4) coupled acoustic model retraining symmetrically across the two channels. The final step gives a relative improvement of over 100% for fillers and feedbacks compared to our previous published results. The F-scores are in the range to make it possible to use the decoder as both a voice activity detector and an illucotary act decoder for semi-automatic annotation.

Full Paper

Bibliographic reference.  Neiberg, Daniel / Gustafson, Joakim (2011): "A dual channel coupled decoder for fillers and feedback", In INTERSPEECH-2011, 3097-3100.