Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
In this paper, we investigate and compare the effectiveness of 11 signal representations for phonetic classification. We also study their interactions with different classification paradigms and feature extraction techniques. In addition, to quantify the effect of the telephone network on high quality wide-band speech, we compare our results on TTMIT and NTIMIT. All our classification experiments have approximately the same number of input dimensions. The signal representations we study fall into a few major categories that are based on: Fourier analysis, linear prediction, cepstral analysis, and auditory processing. Our results indicate that the classification error rate depends on whether the classification technique is well-matched with the signal representation. When a single Gaussian or the multi-layer perceptron is used, the DFT-based representations and PLP tend to have lower error rates than other representations. Compared to our earlier studies, our best error rates have also been reduced to 22% and 30% on TIMIT and NTIMIT, respectively. Furthermore, we have also found that the telephone network increases phonetic classification error rate consistently by a factor of 1.3.
Bibliographic reference. Chigier, Benjamin / Leung, Hong C. (1992): "The effects of signal representations, phonetic classification techniques, and the telephone network", In ICSLP-1992, 97-100.