Sixth ISCA Workshop on Speech Synthesis
This article describes a new unsupervised methodology to learn F0 classes using HMM on a syllable basis. A F0 class is represented by a HMM with three emitting states. The unsupervised clustering algorithm relies on an iterative gaussian splitting and EM retraining process. First, a single class is learnt on a training corpus (8000 syllables) and it is then divided by perturbing gaussian means of successive levels. At each step, the mean RMS error is evaluated on a validation corpus (3000 syllables). The algorithm stops automatically when the error becomes stable or increases. The syllabic structure of a sentence is the reference level we have taken for F0 modelling even if the methodology can be applied to other structures. Clustering quality is evaluated in terms of cross-validation using a mean of RMS errors between F0 contours on a test corpus and the estimated HMM trajectories. The results show a pretty good quality of the classes (mean RMS error around 4Hz).
Bibliographic reference. Lolive, Damien / Barbot, Nelly / Boeffard, Olivier (2007): "Clustering algorithm for F0 curves based on hidden Markov models", In SSW6-2007, 85-89.