Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
This paper proposes a two-step solution for generating natural prosody in TTS, in which no prosody prediction and modification are needed. A large phonetically and prosodically enriched speech corpus has been collected as the unit pool for the synthesizer. A multi-tier non-uniform unit selection scheme is developed to pick up the most suitable segments for concatenation from the unit pool. Final decisions for all units in the utterance to be synthesized are made by minimizing the overall concatenative cost of the whole utterance. Result from a subjective evaluation shows that the average concatenative cost of a synthesized utterance is highly correlated with its naturalness.
Bibliographic reference. Chu, Min / Peng, Hu / Chang, Eric (2001): "A concatenative Mandarin TTS system without prosody model and prosody modification", In SSW4-2001, paper 115.