Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Statistical Analysis of Filled Pauses’ Rhythm for Disfluent Speech Synthesis

Jordi Adell (1), Antonio Bonafonte (1), David Escudero (2)

(1) Dpt. of Signal Theory and Comunications, Universitat Politècnica de Catalunya, Spain
(2) Dpt. Computer Science, Universidad de Valladolid, Spain

Given that state of the art speech synthesis systems have already reached a high naturalness level, it is time to move to talking speech from the actual read speech framework. For this purpose it is thus necessary to investigate how disfluencies can be included in speech synthesis and even increase its naturalness. This paper builds on a previously presented work and focuses on finding a local model of filled pauses rhythm. A statistical study of rhythm effects around filled pauses is presented and based on the correlation between rhythm variables, a regression model is proposed to predict filled pauses duration and prepausal lengthening.

