This paper describes a high-quality Japanese text-to-speech system called NARSYS (NARration SYS-tem). Former systems have two problems, misreadings and unarticulate synthesized speech. For the first problem, we introduce a high accuracy word detection algorithm based on a DP matching method that uses bigram and unigram language models. For the second problem, we introduce a wave-packet concatenating method that uses a tri-phoneme context dependent wave-packet database. The wave packets of 23,000 are manually extracted from natural speech. Accuracies of word-to-phoneme conversion and estimation of bunsetsu accent are 99.8% and 95.9% for sentences in several fields. Syllable articulation score for male and female voices are 88.9% and 73.4%. (* A bunsetsu is a Japanese small phrase which consists of a content word or a content word with some function words.)
Bibliographic reference. Katae, Nobuyuki / Matsumoto, Tatsuro / Kimura, Shinta / Kaseda, Mitsuko / Ohyama, Takayuki (1995): "High-quality Japanese text-to-speech system: NARSYS", In EUROSPEECH-1995, 1861-1864.