Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

CRF-based Statistical Learning of Japanese Accent Sandhi for Developing Japanese Text-to-Speech Synthesis Systems

Nobuaki Minematsu (1), Ryo Kuroiwa (2), Keikichi Hirose (2), Michiko Watanabe (1)

(1) Graduate School of Frontier Sciences; (2) Graduate School of Information Science and Technology; University of Tokyo, Japan

In Japanese, every content word has its own H/L pitch pattern when it is uttered isolatedly, called accent type. In a TTS system, this lexical information is usually stored in a dictionary and it is referred to for prosody generation. When converting a written sentence to speech, however, this lexical H/L pattern is often changed according to the context, known as word accent sandhi. This accent change is troublesome for speech synthesis researchers because it is difficult even for native speakers to describe explicitly what kind of mechanism is working for the change although young Japanese learn the mechanism without trouble. For developing a good Japanese TTS system, this implicit and phonological knowledge has to be built in the system. In our previous study [1], we developed a rule-based module for the accent sandhi but it is true that it produced an unignorable number of errors. In this paper, the development of a corpusbased module is described using Conditional Random Fields (CRFs) to predict the change. Although the new module shows the better performance for the prediction than the previous rulebased module, the new module is tuned further by integrating the rule-based knowledge acquired in the previous study.


  1. N. Minematsu, R. Kita, and K. Hirose (2003), "Automatic estimation of accentual attribute values of words for accent sandhi rules of Japanese text-to-speech conversion," Trans. IEICE, vol. E86-D, no.3, pp.550-557

Full Paper    Presentation (pdf)

Bibliographic reference.  Minematsu, Nobuaki / Kuroiwa, Ryo / Hirose, Keikichi / Watanabe, Michiko (2007): "CRF-based statistical learning of Japanese accent sandhi for developing Japanese text-to-speech synthesis systems", In SSW6-2007, 148-153.