EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Voice Transformations: From Speech Synthesis to Mammalian Vocalizations

Min Tang, Chao Wang, Stephanie Seneff

MIT Laboratory for Computer Science, USA

This paper describes a phase vocoder based technique for voice transformation. This method can flexibly manipulate various aspects of the input signal, e.g., pitch, duration, energy, and formant positions, without explicit F0 extraction. The modifications can be specific to any feature dimensions, and can vary over time. There are many potential applications for this technique. In concatenative speech synthesis, it can be applied to transform the voice characteristic of the speech corpus, or to smooth pitch or formant discontinuities between concatenation boundaries. The method can also be used in language learning. We can modify the prosody of the student's speech to match that from a native speaker, and use the result to guide improvements. The technique can also be used to convert other biological signals, such as killer whale vocalizations, to ones that are more appropriate for human auditory perception. Our experiments show encouraging results for all of these applications.

Full Paper

Bibliographic reference.  Tang, Min / Wang, Chao / Seneff, Stephanie (2001): "Voice transformations: from speech synthesis to mammalian vocalizations", In EUROSPEECH-2001, 353-356.