This paper compares the merits of two alternative front-ends for a connectionist vowel recogniser. One, a psychologically motivated auditory hair-cell model, and the other, a conventional FFT as commonly employed in speech recognisers. Our results demonstrate that the choice of front-end has a marked impact on the recognition of vowels, particularly when the networks are trained and tested using speech in the presence of noise. The two front-ends differ in a number of respects that might account for their differing performance. One notable difference is that whereas the FFT based front-end uses an output compression function that is cubic, the psychological model has an intrinsic compression function that is logarithmic. By rescalling the output values for the two front-ends, we demonstrate that the difference in performance is determined by the speech compression function and not by the choice of front-end per se. These results serve two roles. Firstly, they suggest that in comparing alternative front-ends for connectionist recognisers the outputs of the frontends should be transformed to ensure that they are all subject to the same effective compression function. Secondly, they have been used by us to aid in making fine adjustments to the given auditory model.
Bibliographic reference. Thurston, Peter / Norris, Dennis (1991): "A comparison of two compression functions used for noisy vowel detection with back-propagation networks", In EUROSPEECH-1991, 995-998.