First International Conference on Spoken Language Processing (ICSLP 90)
A novel statistical method is proposed in this paper for synthesizing fundamental frequency (F0-) contour of natural Mandarin speech. By taking advantage of simple tone structure, a statistical model is defined to describe the dependence of F0-contour patterns of monosyllables on phonetical features extracted from input texts. In the training, parameters of the model are empirically estimated from a set of sentential utterances. Phonological rules for synthesis are then automatically extracted from the training utterances and implicitly included in the model. In the test, based on the model, the best sequence of F0-contour pattern is estimated using a Viterbi search for an input sentence. Performance of this method was evaluated by simulation using nine repeats of utterances of 112 declarative sentences spoken bv a single speaker. Experiment results show that 77.56% of synthesized F0-contour patterns of monosyllables coincide with the VQ-quantized versions of the original natural speech. Naturality of the synthesized speech is confirmed by an informal listening test.
Bibliographic reference. Chen, Sin-Horng / Lee, Su-Min / Chang, Saga (1990): "A Chinese fundamental frequency synthesizer based on a statistical model", In ICSLP-1990, 829-832.