INTERSPEECH 2006 - ICSLP
The energy parameter has been widely used as an extension to the basic features of mel-frequency cepstral coefficients (MFCCs) to improve the recognition accuracy in speech recognition. In this paper, a simple and effective approach for energy normalization for silence (non-speech) portions in an utterance is proposed. This approach, named as silence energy normalization (SEN), uses the high-pass filtered log-energy as the feature for speech/non-speech classification, and then the log-energy of non-speech frames is set to be a small constant while that of speech frames is kept unchanged. In the experiments conducted on AURORA2 database, we showed that SEN provides an averaged word error rate reduction of 34.9% and 44.6% for Test Sets A and B, respectively, when compared with the baseline processing. It was also shown that SEN outperforms similar approaches like energy subtraction (ES) and feature vector selection (FVS). Finally, we showed that SEN can be integrated with cepstral mean and variance normalization (CMVN), to achieve further improved recognition performance.
Bibliographic reference. Tai, Chung-fu / Hung, Jeih-weih (2006): "Silence energy normalization for robust speech recognition in additive noise environment", In INTERSPEECH-2006, paper 1492-Thu2CaP.10.