Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
The appropriate control of fundamental frequency (FO) contours is extremely important for a high quality synthesis-by-rule system. However, most of the conventional F0 controlling methods only uses accent position and syntactic structure. Giving attention to the variety of parts of speech, we investigated the variation of F0 contours in speech spoken by a professional announcer. Wider and higher F0 values were observed in the case of adverbs. On the other hand, conjunctions showed lower F0. Demonstrative pronouns in the middle of the sentence were uttered with higher F0. From these observations, a new algorithm which takes into account the part of speech was proposed and installed in two synthetic speech systems: a synthesis-by-rule system and a PARCOR analysis-synthesis system. The evaluation result for the synthesis-by-rule system did not show significant improvement. In the PARCOR analysis-synthesis system, however, the new algorithm showed significant improvement in the naturalness for adverbs and conjunctions but not for demonstrative pronouns. These results suggest that the new F0 contour algorithm when combined with accurate duration control and advanced voice source modeling can generate voice quality as high as that from analysis-synthesis systems.
Bibliographic reference. Hara, Noriyo / Tsubaki, Hisayoshi / Wakita, Hisashi (1992): "Fundamental frequency control using linguistic information", In ICSLP-1992, 1195-1198.