This paper presents an approach to parametric voice conversion that can be used in real-time entertainment applications. The approach is based on spectral mapping using an artificial neural network (ANN) with rectified linear units (ReLU). To overcome the oversmoothing problem a special network configuration is proposed that utilizes temporal states of the speaker. The speech is represented using the harmonic plus noise model. The parameters of the model are estimated using instantaneous harmonic parameters. Using objective and subjective measures the proposed voice conversion technique is compared to the main alternative approaches.
Bibliographic reference. Azarov, Elias / Vashkevich, Maxim / Likhachov, Denis / Petrovsky, Alexander (2013): "Real-time voice conversion using artificial neural networks with rectified linear units", In INTERSPEECH-2013, 1032-1036.