Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
Dictionary look-up is the primary strategy for deriving pronunciations for input words in a text-to-speech (TTS) system. This strategy is accurate for dictionary words, but it is not complete: it is impossible to list exhaustively all input words. The proper treatment of 'unknown' words is currently an unsolved problem in TTS synthesis. There are many competing techniques for letter-to-sound conversion and the system developer must make a rational selection among them. However, it is unclear how dierent techniques should be properly compared. In this paper, we re- port a comparative assessment of the competitor methods of letter-to-sound rules, pronunciation by analogy, feedforward neural networks and a k-nearest neighbour method, with respect to their success at automatic phonemisation. This is achieved by using standardised scoring methods, test lexicon and phoneme inventories. The problem of standardising the phoneme set ('harmonisation') is deceptive: this is much harder than at first appears. The principal finding is that (contrary to the weight of opinion expressed in the literature) data-driven techniques outperform knowledge-based methods by a very significant margin.
Bibliographic reference. Damper, Robert I. / Marchand, Y. / Adamson, M. J. / Gustafson, Kjell (1998): "Comparative Evaluation of Letter-to-Sound Conversion Techniques for English Text-to-Speech Synthesis", In SSW3-1998, 53-58.