In the present paper a statistical approach to the representation of speech units for speech synthesis applications is proposed. The approach is based on the use of two different Ergodic IIMM, an Acoustic EHMM (AEIIMM) and a PMonctic EIIMM (PIIEIIMM). The first is representative of the alphabet of elementary sounds, the second is representative of the correspondance among phonetic units, in the linguistic sense, and elementary sounds. In this way the whole process of associating a sequence of spectral states to a sequence of phonetic units is seen as a two-level stochastic process. The paper initially describes the main characteristics of the proposed models, shows how they can be included in an I/PC based text-to-speech synthesizer, and reports some example of synthetic speech.
Bibliographic reference. Giustiniani, M. / Pierucci, Piero (1991): "Phonetic ergodic HMM for speech synthesis", In EUROSPEECH-1991, 349-352.