Sixth ISCA Workshop on Speech Synthesis
The present paper focuses on the utilization of concatenative speech synthesis, aiming to determine and compare the influence on the synthesized speech quality when various unit types are used in the unit selection approach. There are several unit types which can be used for this purpose. This work deals with those most widely used, i.e. halfphones, diphones, phones, triphones and syllables. Speech was synthesized using these unit types and the outcome was listened to a by number of listeners, whose task was to evaluate the quality of synthetic speech. The result of the listening test performed for the Czech language is presented. However, it can be assumed that the results would be probably equal for other languages with similar structure, as we made no language-dependent modification in the Festival system. No research of a similar character has been conducted yet, so this unique evaluation should suggest what unit types are appropriate for general TTS systems.
Bibliographic reference. Gruber, Martin / Tihelka, Daniel / Matousek, Jindrich (2007): "Evaluation of various unit types in the unit selection approach for the Czech language using the Festival system", In SSW6-2007, 276-281.