Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Missing Data Mask Models with Global Frequency and Temporal Constraints

Sébastien Demange, Christophe Cerisara, Jean-Paul Haton

LORIA, France

Missing data recognition has been developed in order to increase noise robustness in automatic speech recognition. Many different factors, including the speech decoding process itself, shall be considered to locate the masks. In this work, we are considering Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is independent from the rest of the signal.

This classification strategy can produce unrelated small "spots", while experiments suggest that oracle reliable and unreliable features tend to be clustered into time-frequency blocks. We call this undesired effect: the "checkerboard" effect.

In this paper, we propose a new Bayesian missing data classifier that integrates frequency and temporal constraints in order to reduce, or avoid, this "checkerboard" effect. The proposed classifier is evaluated on the Aurora2 connected digit corpora. Integrating such constraints in the missing data classification leads to significant improvements in recognition accuracy.

Full Paper

Bibliographic reference.  Demange, Sébastien / Cerisara, Christophe / Haton, Jean-Paul (2006): "Missing data mask models with global frequency and temporal constraints", In INTERSPEECH-2006, paper 1226-Thu2CaP.2.