4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In , we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical methods for reducing the prosodic database required in this methodology. A 50% of reduction can be obtained without compromising the naturalness of the synthetic speech obtained by our previous methodology with the same prosodic corpus. A compromise between variability and reduction in prosodic contours is also discussed.
Sound Example #1 Sound Example #2 Sound Example #3
Bibliographic reference. López-Gonzalo, E. / Rodríguez-García, J. M. (1996): "Statistical methods in data-driven modeling of Spanish prosody for text to speech", In ICSLP-1996, 1377-1380.