INTERSPEECH 2006 - ICSLP
Text-to-speech implementations on embedded devices usually require low memory consumption and computational complexity. Due to its simplicity, formant synthesizer is still an attractive solution for some applications. The formant values and transitions are controlled by a set of rules, which assign control points for synthesis parameters. This paper investigates the possibility to reduce the number of control points for formant contours from four to two per phoneme. The reduced model contained only the values at the end of the onset transition and in the beginning of the offset transition. Various interpolation techniques were studied but linear interpolation was used for its simplicity. The 4- and 2-point models were compared in a listening evaluation test. The results show that the reduction of control points does not have any effect on the perceived quality. The dynamic, context dependent positioning of the two control points preserves the most essential information of formant contours.
Bibliographic reference. Pärssinen, Kimmo / Moberg, Marko (2006): "Evaluation of perceptual quality of control point reduction in rule-based synthesis", In INTERSPEECH-2006, paper 1178-Wed3BuP.12.