13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Deviation Measure of Waveform Symmetry and its Application to High-speed and Temporally-fine F0 Extraction for Vocal Sound Texture Manipulation

Hideki Kawahara (1), Masanori Morise (2), Ryuichi Nisimura (1), Toshio Irino (1)

(1) Faculty of Systems Engineering, Wakayama Univ., Wakayama, Japan
(2) College of Information Science and Eng., Ristumeikan Univ., Kusatsu, Shiga, Japan

A simple and high-speed F0 extractor with high temporal resolution is proposed based on a waveform symmetry measure. Strictly speaking, it is not an F0 extractor. Instead, it is a detector of the lowest prominent sinusoidal component with a salience measure. It can make use of an F0 refinement procedure, when the signal under investigation is a sum of harmonic sinusoidal components. The refinement procedure is based on a stable representation of instantaneous frequency of periodic signals. Application of the proposed algorithm revealed that rapid temporal modulations in both F0 trajectory and spectral envelope exist typically in expressive voices such as lively singing performance. Manipulation of these temporal fine structures (texture) effectively modified perceptual expressiveness, while somewhat preserving perceptual vocal effort and register.

Index Terms: speech analysis, speech synthesis, expressive speech, singing voices

Full Paper    Demo Video (MP4; 14 MB)

Bibliographic reference.  Kawahara, Hideki / Morise, Masanori / Nisimura, Ryuichi / Irino, Toshio (2012): "Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation", In INTERSPEECH-2012, 386-389.