EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Acoustical and Topological Experiments for an HMM-based Speech Segmentation System

Samir Nefti (1), Olivier Boeffard (2)

(1) France TÚlÚcom R&D, DIH/IPS/VMI, France
(2) IRISA, UniversitÚ de Rennes 1, ENSSAT, France

Several specific tasks in the field of text-to-speech synthesis requires a huge amount of labeled speech corpora. Mostly, these labels correspond to phone marks aligned on the speech waveform. Different kind of solutions have been applied to this problem from rule-based systems to stochastic-based ones. We validate here a solution based on Hidden Makov Models. Various test configurations are proposed. At the acoustic level, we compare LSP to MFCC coefficients and the fitness of multigaussians for this segmentation task. At the topological level, we compare standard left-to-right models to phonological dependent topologies. The best configuration we found is related to an MFCC analysis with standard left-to-right models and with diagonal multi-gaussians per state. For this configuration the overall root mean squared error on the test database is 18 +/- 0.3 ms within a 99% confidence interval.

Full Paper

Bibliographic reference.  Nefti, Samir / Boeffard, Olivier (2001): "Acoustical and topological experiments for an HMM-based speech segmentation system", In EUROSPEECH-2001, 1711-1714.