Speech Prosody 2010
Chicago, IL, USA
The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-feature improvement threshold was able to achieve a duration prediction accuracy of 22.5 ms RMSE, a relative error rate reduction of 7.8% over a manually selected baseline feature set.
Index Terms: speech synthesis, phone duration prediction, automatic feature selection, feature set
Bibliographic reference. Webster, Gabriel / Buchholz, Sabine / Latorre, Javier (2010): "Automatic feature selection from a large number of features for phone duration prediction", In SP-2010, paper 013.