Third International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2003)
The changing on peaks structure of the speech spectrum is perhaps the most important cause of degradation of speech recognition systems under adverse conditions. Another drawback concerned to the additive noise effect occurs on the flat spectral zones which are usually raised proportionally to the noise level. These combined effects on both the peaked and the flat spectral zones can be alleviated by trying to restore its original structure, which assumes noise knowledge. However, the random nature and the variability of the noise, the difficulty in discriminating speech pauses, among others, discourage the use of noise estimates as the basis of robust speech recognition algorithms. Alternative approaches based on normalisation procedures become very promising since the noise effect can be alleviated without any knowledge regarding to its existence. This paper suggests a spectral normalisation that though being different can be viewed as a noise estimation procedure in a frame by frame basis, so assuming the clean database as lightly corrupted. This speech normalisation is used to restore the normalised speech spectrum. This normalised spectrum is then renormalised by a baseline spectrum normalisation method, which concentrates essentially in the speech regions of small energy, since in these regions the noise is more dominant, so they require a better degree of robustness.
Full Paper (reprinted with permission from Firenze University Press)
Bibliographic reference. Lima, C. S. / Oliveira, J. F. (2003): "Spectral bi-normalisation for speech recognition in additive noise", In MAVEBA-2003, 103-106.