13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Q-Gaussian Based Spectral Subtraction for Robust Speech Recognition

Hilman F. Pardede (1), Koichi Shinoda (1), Koji Iwano (2)

(1) Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
(2) Faculty of Environmental and Information Studies, Tokyo City University, Yokohama, Japan

Spectral subtraction (SS) is derived using maximum likelihood estimation assuming both noise and speech follow Gaussian distributions and are independent from each other. Under this assumption, noisy speech, speech contaminated by noise, also follows a Gaussian distribution. However, it is well known that noisy speech observed in real situations often follows a heavy-tailed distribution, not a Gaussian distribution. In this paper, we introduce a q-Gaussian distribution in non-extensive statistics to represent the distribution of noisy speech and derive a new spectral subtraction method based on it. In our analysis, the q-Gaussian distribution fits the noisy speech distribution better than the Gaussian distribution does. Our speech recognition experiments showed that the proposed method, q-spectral subtraction (q-SS), outperformed the conventional SS method using the Aurora-2 database.

Index Terms: robust speech recognition, spectral subtraction, Gaussian distribution, q-Gaussian, maximum likelihood

Full Paper

Bibliographic reference.  Pardede, Hilman F. / Shinoda, Koichi / Iwano, Koji (2012): "Q-Gaussian based spectral subtraction for robust speech recognition", In INTERSPEECH-2012, 1255-1258.