Sixth ISCA Workshop on Speech Synthesis
Automatic syllabification of words is challenging, not least because the syllable is difficult to define precisely. This task is important for word modelling in the composition process of concatenative synthesis as well as in automatic speech recognition. There are two broad approaches to perform automatic syllabification: rule-based and data-driven. The rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm infers ‘new’ syllabifications from examples assumed to be correctly-syllabified already. This paper compares the performance of the two basic approaches. However, it is difficult to determine a correct syllabification in all cases and so to establish the quality of the ‘gold standard’ corpus used either to quantitatively evaluate the output of an automatic algorithm or as the example-set on which data-driven methods crucially depend. Thus, three lexical databases of pre-syllabified words were used. Two of these lexicons hold the same 18,016 words with their corresponding syllabifications coming from independent sources, whereas the third corresponds to the 13,594 words that share the same syllabifications according to these two sources. As well as one rule-based approach (Fisher’s implementation of Kahn’s syllabification theory), three data-driven techniques are evaluated: a look-up procedure, an exemplar-based generalization technique, and syllabification by analogy (SbA). The results on the three databases show consistent and robust patterns: the datadriven techniques outperform the rule-based system in word and juncture accuracies by a very significant margin and best results are obtained with SbA.
Full Paper Poster (pdf)
Bibliographic reference. Marchand, Yannick / Adsett, Connie R. / Damper, Robert I. (2007): "Evaluating automatic syllabification algorithms for English", In SSW6-2007, 316-321.