Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This article covers text-based synthesis research carried out at the CNET, based on the concatenation of acoustic units and the utilization of the TD-PSOLA method [MOU 90]. First, a technique used in the automatic segmentation of a logatome data base shall be presented. This technique enables acoustic units to be extracted from the logatome data base. The logatomes are segmented in phonemes using Hidden Markov Models, and the synthesis units are extracted using this segmentation. The interest of this technique resides in the fact that manual segmentation of learning data is no longer necessary. Then, we shall present an algorithm used in determining the best instant of concatenation of two acoustic units. Results have demonstrated that the difference between the manual and automatic segmentation of a French dictionary of units is inferior to 30 ms for 90% of the units. Two other dictionaries (one in German) were also segmented using this method. Informal listening tests were also carried out with success.
Bibliographic reference. Boeffard, Olivier / Miclet, Laurent / White, S. (1992): "Automatic generation of optimized unit dictionaries for text to speech synthesis", In ICSLP-1992, 1211-1214.