INTERSPEECH 2006 - ICSLP
This paper presents a framework which can accommodate the two most widely used contemporary speech synthesis techniques, namely unit selection and hidden Markov models (HMMs). This is achieved by building a very general HMM where we have a network of states, each representing a single frame for a single unit. This network exactly mimics the behaviour of a unit selection system and is effectively memorising the data as an HMM. From this, we can merge states in the network so as to produce a synthesis system of any desired size. The paper discusses this technique as well as a statistical formulation of the join cost and a number of ways to represent the acoustic observations of the states.
Bibliographic reference. Taylor, Paul (2006): "Unifying unit selection and hidden Markov model speech synthesis", In INTERSPEECH-2006, paper 1456-Wed2A3O.5.