13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise

Felix Weninger, Martin Wöllmer, Björn Schuller

Institute for Human-Machine Communication, Technische Universität München, Germany

We address the speaker independent automatic recognition of spontaneous speech in highly instationary noise by applying semi-supervised sparse non-negative matrix factorization (NMF) for speech enhancement coupled with our recently proposed front end utilizing bottleneck (BN) features generated by a bidirectional Long Short-Term Memory (BLSTM) recurrent neural network. In our evaluation, we unite the noise corpus and evaluation protocol of the 2011 PASCAL CHiME challenge with the Buckeye database, and we demonstrate that the combination of NMF enhancement and BNBLSTM front end introduces significant and consistent gains in word accuracy in this highly challenging task at signal-to-noise ratios from -6 to 9 dB.

Full Paper

Bibliographic reference.  Weninger, Felix / Wöllmer, Martin / Schuller, Björn (2012): "Combining bottleneck-BLSTM and semi-supervised sparse NMF for recognition of conversational speech in highly instationary noise", In INTERSPEECH-2012, 302-305.