5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Simplification of TTS Architecture vs. Operational Quality

Eric Keller

Laboratoire d'analyse informatique de la parole (LAIP), Faculte des Lettres, Université de Lausanne, Switzerland

Many applications in mobile telephony and portable computing require high-quality speech synthesis systems with a very modest computational footprint. Our text-to-speech system for French gives satisfactory performance in phonetisation and prosody with considerably reduced computational resources. Using the Mons (Belgium) diphone data base, the program's current version runs in real time on Pentium-type PCs or Mac PPCs. The code requires 442 k, minimum RAM requirement is 4700 k, the minimum disk requirement is 5560 k. The phonetisation and prosody processing has been brought to a first level of optimal compromise between quality and computational footprint. Major further reductions in space requirements would probably necessitate a re-evaluation of sound generation procedures.

Full Paper
Acoustic Examples:   Natural   Synthetic  

Bibliographic reference.  Keller, Eric (1997): "Simplification of TTS architecture vs. operational quality", In EUROSPEECH-1997, 585-588.