13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Improved Prediction of Japanese Word Accent Sandhi Using CRF

Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose

Graduate School of Information Science and Technology, The University of Tokyo, Japan

In Japanese, every content word has its own mora-based H/L pitch pattern when it is uttered isolatedly, called accent type. When reading out a written sentence, however, this lexical H/L pattern is often changed according to the context, known as word accent sandhi. In our previous work, an accent sandhi predictor was developed using CRF, and in this paper, the predictor is improved through feature engineering especially focusing on phrases including numerals and those including loanwords. This is because our previous work showed that the prediction performance was relatively low for those phrases. To optimize the features used for CRF, it is critical to take into account the mechanism of word accent sandhi. We review linguistic and technical literatures that attempted to characterize accent sandhi in the phrases including numerals and loanwords and, by reflecting these characteristics, the features are re-designed. Experiments show that the proposed predictor improved the performance relatively by 37% and 41%, respectively.

Index Terms: word accent sandhi, accent nucleus, text-to- speech, Japanese education, rule-based, corpus-based, CRF

Full Paper

Bibliographic reference.  Minematsu, Nobuaki / Kobayashi, Shumpei / Shimizu, Shinya / Hirose, Keikichi (2012): "Improved prediction of Japanese word accent sandhi using CRF", In INTERSPEECH-2012, 2562-2565.