Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

Predicting segmental duration using Bayesian belief networks

Olga Goubanova

Centre of Speech Technology Research (CSTR), Edinburgh, UK

Modelling segment duration in text-to-speech systems is hindered by the database imbalance and factor interaction problems. We propose a probabilistic Bayesian belief network (BN) approach to overcome data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction in a concise way of causal relationships among the nodes in a directed acyclic (DAG) graph. Furthermore, a belief network approach allows a significant reduction of the number of parameters to be estimated. In our work, we model segment duration as a hybrid Bayesian network consisting of discrete and continuous nodes; each node in the network represents a linguistic factor that affects segmental duration. The interaction between the factors is represented as conditional dependence relations in the graphical model. We contrasted the results of belief network model with those of sums of products model and classification and regression tree (CART) model. We trained and tested all three models on the same data. Our BN model of vowels performs better than the SoP model: the belief network achieves a RMS error of 3 milliseconds compared with 7 ms fiom SoP. The CART model also produces an eror of 3 ms and hence our new model isn 't any worse in terms of final performance. The BN model for consonants also produces promising RMS error values; the BN gives a value of 2 milliseconds versus 4 ms for SoP and 1 ms for the CART. The consonant BN architecture is not optimal in terms of correlation values; a search for better model will be done in the future. However, we think our model has many other advantages compared to SoP, for instance it is much easier to configure and experiment with new features. This should make it easier to adapt to new languages.

Full Paper

Bibliographic reference.  Goubanova, Olga (2001): "Predicting segmental duration using Bayesian belief networks", In SSW4-2001, paper 139.