International Symposium on Tonal Aspects of Languages
This paper presents a method for skeletonising a fundamental frequency (F0 ) contour with its underlying F0 peaks and valleys, without losing the linguistic and para-linguistic information that it conveys. The F0 peaks and valleys are mainly associated with underlying lexical tones, and can be easily converted into other features, such as the response time and amplitude of local F0 rise/fall movements. Consequently, the exact shape of the F0 contour can be then recovered by the use of a functional F0 model, given the F0 peaks and valleys. Experiments were conducted on 668 Chinese utterances (around 1.4 hours of speech) from two native speakers. The validity of the proposed method is consistently proved by a three-fold evaluation: error analyses, perceptual similarity between the re-synthesised tone and intonation and the original, and a listening test of the naturalness of synthetic speech with incorporation of the recovered F0 contours into the unit selection process for synthesis.
Bibliographic reference. Ni, Jinfu / Kawai, Hisashi (2004): "Skeletonising Chinese fundamental frequency contours with a functional model and its evaluation", In TAL-2004, 151-154.