First International Conference on Spoken Language Processing (ICSLP 90)
A neural network architecture was designed for locating word boundaries and identifying words from phoneme sequences. This architecture was tested in three sets of studies. First, a highly redundant corpus with a restricted vocabulary was generated and the network was trained with a limited number of phonemic variations for the words in the corpus. Tests of network performance on a transfer set (i.e., sentences not used during training) yielded a very low error rate. In a second study, a network was trained to identify words from expert transcriptions of speech. On a transfer test, error rate for correct simultaneous identification of words and word boundaries was 31%. The third study used the output of a phoneme classifier as the input to the word and word boundary identification network. The error rate on a transfer test set was 49% for this task. Overall, these studies provide a first step at identifying words in connected discourse with a neural network. While the results are moderately encouraging, the identification of word boundaries was especially difficult when the input sequences contained many inserted and deleted phonemes. Although many issues, including scaling the model to larger corpora, are unresolved, many strategies for improving performance remain to be explored.
Bibliographic reference. Allen, R. B. / Kamm, C. / James, S. B. (1990): "A recurrent neural network for word identification from phoneme sequences", In ICSLP-1990, 1037-1040.