13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Implementation of Computationally Efficient Real-Time Voice Conversion

Tomoki Toda (1), Takashi Muramatsu (1), Hideki Banno (2)

(1) Graduate School of Information Science, Nara Institute of Science and Technology, Japan
(2) Graduate School of Science and Technology, Meijo University, Nagoya-shi, Aichi, Japan

This paper presents an implementation of real-time processing of statistical voice conversion (VC) based on Gaussian mixture models (GMMs). To develop VC applications for enhancing our human-to-human speech communication, it is essential to implement real-time conversion processing. Moreover, it is useful to further reduce computational complexity of the conversion processing for making VC applications available in limited resources. In this paper, we propose an implementation method of real-time VC based on low-delay conversion processing considering dynamic features and a global variance. Moreover, we also propose computationally efficient VC processing based on fast source feature extraction and diagonalization of full covariance matrices. Some experimental results are presented to show that the proposed methods works reasonably well.

Index Terms: voice conversion, real-time processing, lowdelay conversion, computational efficiency

Full Paper

Bibliographic reference.  Toda, Tomoki / Muramatsu, Takashi / Banno, Hideki (2012): "Implementation of computationally efficient real-time voice conversion", In INTERSPEECH-2012, 94-97.