Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Decision Tree-Based Training of Probabilistic Concatenation Models for Corpus-Based Speech Synthesis

Shinsuke Sakai, Tatsuya Kawahara

Kyoto University, Japan

The measure of the goodness, or cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is represented as the conditional probability of observing the spectral shape of a unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. A phonetic decision-tree based parameter tying is performed to achieve a robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented in a corpus-based speech synthesizer trained with a CMU Arctic database and the effectiveness of the proposed method was confirmed by a subjective listening test.

Full Paper

Bibliographic reference.  Sakai, Shinsuke / Kawahara, Tatsuya (2006): "Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis", In INTERSPEECH-2006, paper 1564-Wed2A3O.2.