13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation

Daniel Erro, Eva Navas, Inma Hernáez

AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain

We present a method that determines the optimal configuration of a bilinear vocal tract length normalization function to transform the frequency axis of one voice according to a specific target voice. Given a number of parallel utterances of the involved speakers, the single parameter of this function can be calculated through an iterative procedure by minimizing an objective error measure defined in the cepstral domain. This method is also applicable when multiple warping classes are considered, and it can be complemented with amplitude correction filters. The resulting physically motivated cepstral transformation results in highly satisfactory conversion accuracy and improved quality with respect to standard satistical systems.

Index Terms: vocal tract length normalization, voice conversion, frequency warping plus amplitude scaling, speech synthesis.

Full Paper

Bibliographic reference.  Erro, Daniel / Navas, Eva / Hernáez, Inma (2012): "Iterative MMSE estimation of vocal tract length normalization factors for voice transformation", In INTERSPEECH-2012, 86-89.