As classic and intrinsic requirements, synthetic speech need to convey correct information with good quality of naturalness to listeners. Fundamental frequency (F0) contours need to be controlled to meet these requirements. Additional challenges have been introduced to tonal languages because the F0 contour reflects both intelligibility and naturalness of the speech. According to the fact that the F0 contour in a syllable conveys information asymmetrically, Tone nucleus model has been successfully established. In this study, Tone nucleus model is applied in order to generate F0 contours for Thai speech synthesis. This is among the first that has introduced the model to other tonal languages other than Mandarin. All tone nuclei for five distinctive tones are defined according to the underlying targets. The full process of F0 contour generation is presented from the nucleus extraction until the F0 contour generation for continuous speech. The efficiency and adaptability of the model in Thai language were confirmed by the objective and subjective tests. The model outperformed a baseline without applying the model. The generated F0 contours showed less distortion, more tone intelligibility and more naturalness. The modified method is also introduced for enhancement. The results showed significant improvement on the generated F0 contours.
Bibliographic reference. Krityakien, Oraphan / Hirose, Keikichi / Minematsu, Nobuaki (2013): "Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model", In INTERSPEECH-2013, 1037-1041.