4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown  that normalisation of the dynamic range (SNR1) of the frequency channels in a mel scale triangular filterbank (MFCC) , improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in the case of SNR values that are smaller than the target (normalised) SNR. A solution for this problem can be found in first enhancing the filterbank energies before the masking-technique is applied. For this purpose we developed a Non-linear Spectral Estimator (NSE) for speech recognition that operates on the log filterbank energies. NSE enhances these filterbank energies and makes use of SNR-normalisation also effective at very low SNRs. Experimental results are given on the NOISEX-92  database. Better recognition performance is seen even at 0dB SNR.
Bibliographic reference. Claes, Tom / Xie, Fei / Compernolle, Dirk van (1996): "Spectral estimation and normalisation for robust speech recognition", In ICSLP-1996, 1997-2000.