Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
An artificial neural network has been trained by the error back-propagation technique to recognise phonemes and words. The speech material was recorded by a male Swedish talker and was labelled by a phonetician. There were 38 output nodes corresponding to Swedish phonemes. The training algorithm was somewhat modified to increase the training speed. Introducing coarticulation information by adding simple recurrency to the net is shown to more effective than expanding the size of the input spectral window. The phoneme recognition network was used with dynamic programming for time alignment to recognise connected digits. It was compared to a similar recogniser based on nine quasi-phonetic features instead of 38 phonemes. The phoneme based system performed better than the feature based one.
Bibliographic reference. Elenius, Kjell / Blomberg, Mats (1992): "Comparing phoneme and feature based speech recognition using artificial neural networks", In ICSLP-1992, 1279-1282.