4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Spectral Estimation and Normalisation for Robust Speech Recognition

Tom Claes, Fei Xie, Dirk van Compernolle

K.U. Leuven - E.S.A.T., Heverlee, Belgium

Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR1) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in the case of SNR values that are smaller than the target (normalised) SNR. A solution for this problem can be found in first enhancing the filterbank energies before the masking-technique is applied. For this purpose we developed a Non-linear Spectral Estimator (NSE) for speech recognition that operates on the log filterbank energies. NSE enhances these filterbank energies and makes use of SNR-normalisation also effective at very low SNRs. Experimental results are given on the NOISEX-92 [3] database. Better recognition performance is seen even at 0dB SNR.

Full Paper

Bibliographic reference.  Claes, Tom / Xie, Fei / Compernolle, Dirk van (1996): "Spectral estimation and normalisation for robust speech recognition", In ICSLP-1996, 1997-2000.