Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

High-Quality Japanese Text-To-Speech System: NARSYS

Nobuyuki Katae (1), Tatsuro Matsumoto (1), Shinta Kimura (1), Mitsuko Kaseda (2), Takayuki Ohyama (2)

(1) Fujitsu Laboratories Ltd., Akashi, Japan; (2) Fujitsu Ltd, Machida, Japan

This paper describes a high-quality Japanese text-to-speech system called NARSYS (NARration SYS-tem). Former systems have two problems, misreadings and unarticulate synthesized speech. For the first problem, we introduce a high accuracy word detection algorithm based on a DP matching method that uses bigram and unigram language models. For the second problem, we introduce a wave-packet concatenating method that uses a tri-phoneme context dependent wave-packet database. The wave packets of 23,000 are manually extracted from natural speech. Accuracies of word-to-phoneme conversion and estimation of bunsetsu accent are 99.8% and 95.9% for sentences in several fields. Syllable articulation score for male and female voices are 88.9% and 73.4%. (* A bunsetsu is a Japanese small phrase which consists of a content word or a content word with some function words.)

Full Paper

Bibliographic reference.  Katae, Nobuyuki / Matsumoto, Tatsuro / Kimura, Shinta / Kaseda, Mitsuko / Ohyama, Takayuki (1995): "High-quality Japanese text-to-speech system: NARSYS", In EUROSPEECH-1995, 1861-1864.