Sixth ISCA Workshop on Speech Synthesis
This paper proposes a spectral conversion technique based on a new statistical model which includes time-sequence matching. In conventional GMM-based approaches, the Dynamic Programming (DP) matching between source and target feature sequences is performed prior to the training of GMMs. Although a similarity measure of two frames, e.g., the Euclid distance is typically adopted, this might be inappropriate for converting the spectral features. The likelihood function of the proposed model can directly deal with two different length sequences, in which a frame alignment of source and target feature sequences is represented by discrete hidden variables. In the proposed algorithm, the maximum likelihood criterion is consistently applied to the training of model parameters, sequence matching and spectral conversion. In the subjective preference test, the proposed method is superior than the conventional GMM-based method.
Bibliographic reference. Nankaku, Yoshihiko / Nakamura, Kenichi / Toda, Tomoki / Tokuda, Keiichi (2007): "Spectral conversion based on statistical models including time-sequence matching", In SSW6-2007, 333-338.