Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion

Mikihiro Nakagiri, Tomoki Toda, Hideki Kashioka, Kiyohiro Shikano

Nara Institute of Science & Technology, Japan

The conversion method from Non-Audible Murmur (NAM) to ordinary speech based on the statistical voice conversion (NAM-to-Speech) has been proposed towards realization of "silent speech telephone." Although NAM-to-Speech converts NAM to intelligible voices with similar quality to speech, there is still a large problem, i.e., difficulties of the F0 estimation from unvoiced speech. In order to avoid this problem, we propose a conversion method from NAM to whisper that is a familiar and intelligible unvoiced speech (NAM-to-Whisper). Moreover, we enhance NAM-to-Whisper so that multiple types of body-transmitted unvoiced speech such as NAM and Body Transmitted Whisper (BTW) are accepted as input voices. We evaluate the performance of the proposed conversion method. Experimental results demonstrate that 1) intelligibility and naturalness of NAM are significantly improved by NAM-to-Whisper, 2) NAM-to-Whisper outperforms NAM-to-Speech, and 3) we can train a single conversion model successfully converting both NAM and BTW to the target voice.

Full Paper

Bibliographic reference.  Nakagiri, Mikihiro / Toda, Tomoki / Kashioka, Hideki / Shikano, Kiyohiro (2006): "Improving body transmitted unvoiced speech with statistical voice conversion", In INTERSPEECH-2006, paper 1719-Thu1BuP.6.