ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing

ICC Jeju, Korea
October 3, 2004

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition

John Hershey, Trausti Kristjansson, Zhengyou Zhang

Microsoft Research, Redmond, WA, USA

We present a probabilistic framework that uses a bone sensor and air microphone to perform speech enhancement for robust speech recognition. The system exploits advantages of both sensors: the noise resistance of the bone sensor, and the linearity of the air microphone. In this paper we describe the general properties of the bone sensor relative to conventional air sensors. We propose a model capable of adapting to the noise conditions, and evaluate performance using a commercial speech recognition system. We demonstrate considerable improvements in recognition - from a baseline of 57% up to nearly 80% word accuracy - for four subjects on a difficult condition with background speaker interference.

Full Paper

Bibliographic reference.  Hershey, John / Kristjansson, Trausti / Zhang, Zhengyou (2004): "Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition", In SAPA-2004, paper 139.