Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Expanding Phonetic Coverage in Unit Selection Synthesis Through Unit Substitution from a Donor Voice

Alistair Conkie, Ann K. Syrdal

AT&T Labs Research, USA

This paper describes experiments with synthetic voices using unit selection [1] concatenative synthesis where portions of the database audio recordings are modified for the purpose of producing a wider set of phonemes than is contained in the original voice recordings. Since it is known that performing global signal modification for the purposes of speech synthesis significantly reduces perceived voice quality [2] [3], the modifications that we perform are specifically confined to aperiodic portions of the signal that tend neither to cause concatenation discontinuities nor to convey much of the individual character or affect of the speaker.

We propose three methods to extend the phonetic coverage of unit selection voices (1) by modifying parts of a voice so that extra phones extracted from a donor voice can be added off line; (2) by extending the above methodology by using a harmonic plus noise model (HNM) [4] for speech representation in order to control aspects of the modification; (3) by combining recorded inventories from two voices so that at synthesis time selections can be made from either.

Experiments were conducted to evaluate the strengths and weaknesses of the three methods.

Full Paper

Bibliographic reference.  Conkie, Alistair / Syrdal, Ann K. (2006): "Expanding phonetic coverage in unit selection synthesis through unit substitution from a donor voice", In INTERSPEECH-2006, paper 2001-Wed2A3O.4.