4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A Recurrent Network that Learns to Pronounce English Text

M. J. Adamson, Robert I. Damper

Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton, UK

Previous attempts to derive connectionist models for text-to-phoneme conversion - such as NETtalk and NETspeak - have generally used pre-aligned training data and purely feedforward networks, both of which represent simplifications of the problem. In this work, we explore the potential of recurrent networks to perform the conversion task when trained on non-aligned data. Initially, our use of a single recurrent network produced disappointing results. This led to the definition of a two-phase model in which the hidden-unit representation of an auto-associative network was fed forward to a recurrent network. Although this model currently does not perform as well as NETspeak, it is solving a harder problem. Also, we propose several possible avenues for improvement.

Full Paper

Bibliographic reference.  Adamson, M. J. / Damper, Robert I. (1996): "A recurrent network that learns to pronounce English text", In ICSLP-1996, 1704-1707.