Speech Prosody 2002
A new method was developed to include prosodic boundary information into statistical language modeling. This method is based on counting word transitions separately for the cases crossing accent phrase boundaries and not crossing them. Since direct calculation of the above two types of word transitions requires a large speech corpus which is practically impossible to make, bi-gram counts of part-of-speech (POS) transitions were first calculated for a small speech corpus separately for the two cases instead. Then, word bi-gram counts calculated for a largescale text corpus were divided into the two cases according to the POS transition feature, and finally, two types of word bigram models, one crossing accent phrase boundaries and the other not, were obtained. The method was evaluated through perplexity reduction by the proposed models from the baseline models. When correct boundary position was used, the reduction reached 11%, and when boundaries were extracted using our formerly developed method based on mora-F0 transition modeling, it was 8%. The reduction around 6% was still observed for speech uttered by a speaker different from the one for the corpus used to calculate the POS bi-gram counts.
Bibliographic reference. Hirose, Keikichi / Minematsu, Nobuaki / Terao, Makoto (2002): "N-gram language modeling of Japanese using prosodic boundaries", In SP-2002, 395-398.