Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Automatic Feature Selection from a Large Number of Features for Phone Duration Prediction

Gabriel Webster, Sabine Buchholz, Javier Latorre

Toshiba Research Europe, Cambridge Research Laboratory, Cambridge, UK

The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-feature improvement threshold was able to achieve a duration prediction accuracy of 22.5 ms RMSE, a relative error rate reduction of 7.8% over a manually selected baseline feature set.

Index Terms: speech synthesis, phone duration prediction, automatic feature selection, feature set

Full Paper

Bibliographic reference.  Webster, Gabriel / Buchholz, Sabine / Latorre, Javier (2010): "Automatic feature selection from a large number of features for phone duration prediction", In SP-2010, paper 013.