Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Pitch and Duration Transformation with Non-Parallel Data

Damien Lolive, Nelly Barbot, Olivier Boeffard

IRISA / University of Rennes 1; ENSSAT, Lannion, France

In a voice transformation context, prosody transformation using parallel corpora is quite unrealistic as such corpora are difficult and also expensive to build. Based on this observation, we propose an approach for transforming prosody using nonparallel corpora thanks to the MLLR adaptation strategy. This methodology is applied to the joint transformation of duration and F0 at the syllable level. The source data are modelled by a GMM which is adapted to the target by applying a linear transformation to the mean vectors of the gaussian mixture. This methodology is applied to the conversion of duration and F0 between two french speakers and is evaluated by cross validation between the models and the test datasets. Taking the target model as reference, the adaptation enables to make 80% of the path between source and target data.

