First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Time-Frequency Spectral Analysis of Speech

David Rainton (1), S. J. Young (2)

(1) ATR Interpreting Telephony Research Laboratories
(2) Cambridge University Department of Engineering, UK

In recent years there has been a growing interest amongst the speech research community into the use of spectral estimators which circumvent the traditional quasi-stationary assumption and provide greater time-frequency (t-f) resolution than conventional spectral estimators, such as the short time Fourier power spectra (STFPS). One distribution in particular, the Wigner distribution (WD), has attracted considerable interest. However, experimental studies have indicated that, despite its improved t-f resolution, employing the WD as the front end of a speech recognition system actually reduces recognition performance; only by explicitly re-introducing t-f smoothing into the WD are recognition rates improved. By re-formulating the spectral estimation problem in terms of a bias variance optimisation task, we provide an explanation for these previous experimental findings.

A practical adaptive smoothing algorithm is introduced, which attempts to match the degree of smoothing introduced into the WD with the time varying quasi-stationary regions within the speech waveform. The recognition performance of the resulting adaptively smoothed estimator is found to be comparable to that of conventional interbank estimators, yet the average temporal sampling rate of the resulting spectral vectors is reduced by around a factor of ten.

Full Paper

Bibliographic reference.  Rainton, David / Young, S. J. (1990): "Time-frequency spectral analysis of speech", In ICSLP-1990, 349-352.