Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

Prosodic unit selection using an imitation speech database

Joram Meron

Panasonic Speech Technology Laboratory, Santa Barbara, CA, USA

Starting with a rule based prosody generation system, we try to improve the naturalness of the generated prosody by using a corpus based approach, without losing the advantages of the rule based method. To achieve this, a prosodic unit selection method is introduced, which is similar in its approach to the waveform unit selection used by large unit inventory waveform concatenation systems.

Trying to avoid the problem of incomplete unit description in existing prosodic databases, a new method of data collection and labeling is introduced. A small database of the proposed kind was collected, and results of applying selection algorithm to it are given.

The approach described in this paper could be useful for improving prosody naturalness and assisting in personalizing prosody. It requires relatively little expert manual work, and can be used for small footprint TTS systems.

Full Paper

Bibliographic reference.  Meron, Joram (2001): "Prosodic unit selection using an imitation speech database", In SSW4-2001, paper 113.

Acoustic Examples (WAV format)

There are 5 pairs of sentences. In each pair one sentence was produced using the rule prosody (rule_X.wav), and the other by the method suggested in this paper (imit_X.wav).
imit_1   rule_1
imit_2   rule_2
imit_3   rule_3
imit_4   rule_4
imit_5   rule_5