4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
The performance of speech recognition system degrades rapidly in the presence of ambient noise. To reduce the degradation, a degradation model is proposed which represents the spectral changes of speech signal uttered in noisy environments. The model uses frequency warping and amplitude scaling of each frequency band to simulate the variations of formant location, formant bandwidth, pitch, spectral tilt, and energy in each frequency band by Lombard effect. Another Lombard effect, the variation of overall vocal intensity is represented by a multiplicative constant term depending on spectral magnitude of input speech. The noise contamination is represented by an additive term in the frequency domain. According to this degradation model, the cepstral vector of clean speech is estimated from that of noisy-Lombard speech using spectral subtraction, spectral magnitude normalization, band-pass filter in LIN-LOG spectral domain, and multiple linear transformation. Noisy-Lombard speech data is collected by simulating the noisy environments using noises from automobile cabins, an exhibition hall, telephone booths in downtown, crowded streets, and computer rooms. The proposed method significantly reduces error rates in the recognition of 50 Korean word. For example, the recognition rate is 95.91% with this method, and 79.68% without this method at SNR (Signal-to-Noise Ratio) 10 dB.
Bibliographic reference. Chi, Sang-mun / Oh, Yung-Hwan (1996): "Lombard effect compensation and noise suppression for noisy Lombard speech recognition", In ICSLP-1996, 2013-2016.