Symposium on Machine Learning in Speech and Language Processing (MLSLP)

Bellevue, WA, USA
June 27, 2011

A Comparison of Performance Monitoring Approaches to Fusing Spectrogram Channels in Speech Recognition

Shirin Badiezadegan, Richard Rose

Department of Electrical and Computer Engineering, McGill University, Montreal, Canada

Implementations of two performance monitoring approaches to feature channel integration in robust automatic speech recognition are presented. These approaches combine multiple feature channels, where the first one uses a feed-forward entropy-based criterion and the second one, motivated by psychophysical evidence in human speech perception, employs a closed loop criterion relating to the overall performance of the system. The multiple feature channels correspond to an ensemble of reconstructed spectrograms generated by applying multiresolution discrete wavelet transform analysissynthesis filter-banks to corrupted speech spectrograms. The spectrograms associated with these feature channels differ in the degree to which information has been suppressed in multiple scales and frequency bands. The performance of these approaches is evaluated in the Aurora 3 speech in noise task domain.

Index Terms: spectrographic mask, wavelet-based denoising, spectrogram reconstruction

Full Paper    

Bibliographic reference.  Badiezadegan, Shirin / Rose, Richard (2011): "A comparison of performance monitoring approaches to fusing spectrogram channels in speech recognition", In MLSLP-2011, 6-10.