Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Are Rule-based Syllabification Methods Adequate for Languages with Low Syllabic Complexity? The Case of Italian

Connie R. Adsett (1), Yannick Marchand (2)

(1) Institute for Biodiagnostics (Atlantic), National Research Council Canada, Halifax, Nova Scotia, Canada
(2) Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this paper, three rule-based automatic syllabification systems are compared and two data-driven (Syllabification by Analogy and the Look-Up Procedure) on a language with lower syllabic complexity - Italian. Using a leave-one-out procedure on 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule-based method correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, indicating that these may be the best approaches to the syllabification component of speech synthesis systems.

Full Paper   Poster (pdf)

Bibliographic reference.  Adsett, Connie R. / Marchand, Yannick (2007): "Are rule-based syllabification methods adequate for languages with low syllabic complexity? the case of Italian", In SSW6-2007, 58-63.