INTERSPEECH 2006 - ICSLP
We describe a unit selection technique for text-to-speech synthesis which jointly searches the space of possible diphone sequences and the space of possible prosodic unit sequences in order to produce synthetic speech with more natural prosody. We demonstrates that this search, although currently computationally expensive, can achieve improved intonation compared to a baseline in which only the space of possible diphone sequences is searched. We discuss ways in which the search could be made sufficiently efficient for use in a real-time system.
Bibliographic reference. Clark, Robert A. J. / King, Simon (2006): "Joint prosodic and segmental unit selection speech synthesis", In INTERSPEECH-2006, paper 1262-Tue3BuP.5.