5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Time Shift Invariant Speech Recognition

Sankar Basu, Abraham Ittycheriah, Stéphane Maes

IBM T.J. Watson Research center, USA

When shifting by a few samples a speech signal, we have observed significant variations of the feature vectors produced by the acoustic front-end. Furthermore, these utterances when decoded with a continuous speech recognition system leads to dramatically different word error rates. This paper analyzes the phenomena and illustrates the well known result that classical acoustic front end processors including spectrum and cepstra based techniques suffer from time-shift. After describing the effect of sample sized shifts on the spectral estimates of the signal, we propose several techniques which take advantage of shift variations to multiply the amount of training that speech utterances can provide. Eventually, we illustrate how it is possible to slightly modify the acoustic front-end to render the recognizer invariant to small shifts.

Full Paper

Bibliographic reference.  Basu, Sankar / Ittycheriah, Abraham / Maes, Stéphane (1998): "Time shift invariant speech recognition", In ICSLP-1998, paper 0983.