EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Accent Label Prediction by Time Delay Neural Networks Using Gating Clusters

Achim F. Müller (1), Rüdiger Hoffmann (2)

(1) Siemens, Germany
(2) Dresden University of Technology, Germany

In this paper a new neural network (NN) architecture for data driven prediction of accent labels---perceptual accents and pitch accents---for speech synthesis is presented. Within the proposed NN architecture, gating clusters are applied in a time delay (TD) framework. The gating clusters are used to adapt the network structure dynamically such that only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction on word level within our text-to-speech (TTS) system. Prediction accuracy for our German corpus was 86.1%. On an english corpus the achieved accuracy was 84.5%. This result is superior to results achieved on the same corpus with an approach based on classification and regression tree (CART) techniques[1]. The results were achieved with a simpler feature set than that used in[1]. [1] K. Ross and M. Ostendorf, "Prediction of abstract prosodic labels for speech synthesis"

Full Paper

Bibliographic reference.  Müller, Achim F. / Hoffmann, Rüdiger (2001): "Accent label prediction by time delay neural networks using gating clusters", In EUROSPEECH-2001, 549-553.