The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
Pitch modelling is considered to be an important factor in speech synthesis where the pitch contour plays a demonstrable role in the intelligibility and naturalness of synthesised speech. While quantitative models for pitch contours have been proposed previously, each of these have a fixed level of details and as such not all of them provide the basis either for automatic extraction of pitch model parameters or for measuring the distance between two instances of a model. In this paper, a novel and compact quantitativemodel for pitch contour is presented which covers the possible variations in pitch and can be automatically extracted. The minimum F0 value, the level global slope of a pitch segment and the semi-periodic jitter properties are used as pitch components and are modelled with a linear function, a sine function and a set of sine functions respectively. A distance measure is defined for the model which takes the shape of the contours into consideration. Experiments show a low mean square error (MSE) for the estimated contours for different languages across different corpora, and investigate the accuracy of the distance function on the model.
Index Terms: pitch modelling, prosody modelling
Bibliographic reference. Abou-Zleikha, Mohamed / Cahill, Peter / Carson-Berndsen, Julie (2010): "An automatic pitch model with distance function", In SSW7-2010, 306-311.