Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Unifying Unit Selection and Hidden Markov Model Speech Synthesis

Paul Taylor

University of Cambridge, UK

This paper presents a framework which can accommodate the two most widely used contemporary speech synthesis techniques, namely unit selection and hidden Markov models (HMMs). This is achieved by building a very general HMM where we have a network of states, each representing a single frame for a single unit. This network exactly mimics the behaviour of a unit selection system and is effectively memorising the data as an HMM. From this, we can merge states in the network so as to produce a synthesis system of any desired size. The paper discusses this technique as well as a statistical formulation of the join cost and a number of ways to represent the acoustic observations of the states.

Full Paper

Bibliographic reference.  Taylor, Paul (2006): "Unifying unit selection and hidden Markov model speech synthesis", In INTERSPEECH-2006, paper 1456-Wed2A3O.5.