We present a practical and noise-robust speech recognition system which estimates a target-to-interferers power ratio using a zerocrossing- based binaural model and applies the power ratio to a channel attentive missing feature decoder in the cepstral domain. In a natural multisource environment, our binaural model extracts spatial cues at each zero-crossing of a filterbank output signal to localize multiple sound sources and estimates a ratio mask reliably which segregates target speech from interfering noises. Our system uses gammatone filterbank cepstral coefficients (GFCCs) for the recognition and the channel attentive decoder utilizes the ratio mask on weighting the cepstral features when calculating the output probability in the Viterbi decoding. On the experiments of CHiME final testset, our channel attentive GFCC system improves the baseline recognition result 12.2% on average, and with noisy training condition, the average improvement amounts to 18.8%.
Bibliographic reference. Kim, Young-Ik / Cho, Hoon-Young / Kim, Sang-Hun (2011): "Zero-crossing-based channel attentive weighting of cepstral features for robust speech recognition: the ETRI 2011 CHiME challenge system", In INTERSPEECH-2011, 1649-1652.