Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in listeners' preferences. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of symbolic strings which were either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic string is appropriate. Considering the relative importance of the sym- bolic representation, "post-lexical" segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to calculate an appropriate phonological symbolic representation in order to improve timing in synthetic speech.
Bibliographic reference. Brinckmann, Caren / Trouvain, Jürgen (2001): "On the role of duration prediction and symbolic representation for the evaluation of synthetic speech ", In SSW4-2001, paper 137.