First International Conference on Spoken Language Processing (ICSLP 90)
The speaker independence and the context modelling are the key problems in automatic segmentation and alignment of continuous speech, which are connected with the segmental concept of speech. In this paper, a new approach is presented: a robust speaker-independent algorithm for this task. It aligns a phonetic transcription with a phoneme nucleus detector using the temporal decomposition (TD) paradigm. The algorithm performs this task in 3 stages: a) Predetection of phoneme nuclei centers candidates using an adaptive detection window; b) Time-alignment of the corresponding phonetic transcription using a TD model based Dynamic Time Warping (TD-DTW) procedure; c) Adjustment of these output nuclei centers and phoneme boundaries detection based also on the TD model. A new temporal decomposition technique was developed also. This algorithm has been trained using 200 sentences pronounced by one speaker and tested using 50 sentences pronounced by 7 speakers. On the test corpus, 86% of the phonemes nuclei centers candidates fall into one manual segment alone. 94% of the final nuclei centers match the manual segmentation.
Bibliographic reference. Wang, H. D. / Bailly, Gérard / Tuffelli, D. (1990): "Automatic segmentation and alignment of continuous speech based on temporal decomposition model", In ICSLP-1990, 457-460.