4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper addresses the problem of the fundamental frequency (F0) determination of a speech signal, and proposes four improvements to conventional frequency-domain methods. The major improvement is a multi-scale analysis which extends the range of F0 that can be correctly processed. It builds on the lag-window method proposed by Sagayama (1978), hence the name "multi-lag-window". Secondly, a modification of the lag-window method itself improves its robustness to periodic noises (while loosing its gain-independence property). Thirdly, a rescaling is introduced to permit a full Dynamic Programming search for the optimal F0 curve. Finally, a mathematically justified peak interpolation is proposed for replacing the conventional, inaccurate parabolic interpolation. These four improvements result in an accurate, robust, extended range F0 determination method, which was tested on spontaneous speech from 20 speakers, ranging from less than 50 Hz to more than 600 Hz.
Full Paper Sound Example
Bibliographic reference. Geoffrois, Edouard (1996): "The multi-lag-window method for robust extended-range F0 determination", In ICSLP-1996, 2239-2242.