13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Modelling a Noisy-channel for Voice Conversion Using Articulatory Features

Bajibabu Bollepalli (1), Alan W. Black (2), Kishore Prahallad (1)

1Speech and Vision Lab, International Institute of Information Technology, Hyderabad, India 2 Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper, we propose modeling a noisy-channel for the task of voice conversion (VC). We have used the artificial neural networks (ANN) to capture speaker-specific characteristics of a target speaker which avoids the need for any training utterance from a source speaker. We use articulatory features (AFs) as canonical form or speaker-independent representation of speech signal. Our studies show that AFs contain significant amount of speaker information in their trajectories. Suitable techniques are proposed to normalize the speaker-specific information in AF trajectories and the resultant AFs are used in voice conversion. The results of voice conversion evaluated using objective and subjective measures confirm that speaker-specific characteristics of target speaker could be captured.

Index Terms: voice conversion, articulatory features, noisy-channel model, speaker-independent representation

Full Paper

Bibliographic reference.  Bollepalli, Bajibabu / Black, Alan W. / Prahallad, Kishore (2012): "Modelling a noisy-channel for voice conversion using articulatory features", In INTERSPEECH-2012, 2202-2205.