4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

On the Robust Automatic Segmentation of Spontaneous Speech

Bojan Petek, Ove Andersen, Paul Dalsgaard

Center for PersonKommunikation, Aalborg University, Denmark

The results from applying an improved algorithm in the task of automatic segmentation of spontaneous telephone quality speech are presented, and compared to the results from those resulting from super imposing white noise. Three segmentation algorithms are compared which are all based on variants of the Spectral Variation Function. Experimental results are obtained on the OGI multi-language telephone speech corpus (OGI TS).We show that the use of the auditory forward and backward masking effects prior to the SVF computation increases the robustness of the algorithm to white noise. When the average signal-to-noise ratio (SNR) is decreased to 10dB the peak ratio (defined as the ratio of the number of peaks measured at the target over the original SNRs) is increased by 16%, 12%, and 11% for theMFC(Mel-FrequencyCepstra), RASTA(RelAtive SpecTrAl processing), and the FBDYN (Forward-Backward auditory masking DYNamic cepstra) SVF segmentation algorithms, respectively.

Full Paper

Bibliographic reference.  Petek, Bojan / Andersen, Ove / Dalsgaard, Paul (1996): "On the robust automatic segmentation of spontaneous speech", In ICSLP-1996, 913-916.