In this study, we consider the problem of speaker recognition in a non-stationary room/channel mismatched condition. In such circumstances, cepstral coefficients are affected in a way that the short-term stationarity assumption, on which conventional feature normalization methods are based on, may not be valid. We observe that the empirical mode decomposition (EMD) applied to the cepstral feature stream can partially separate out the nonstationary channel components, if present, into its residual signal and other lower order intrinsic mode functions (IMFs), which leads us to develop a filtering scheme based on this decomposition. The proposed method works in the time domain making use of the instantaneous frequency function obtained through Hilbert spectral analysis of the IMFs. Experimental evaluations on the TIMIT database with added non-stationary room channels in test demonstrate the superiority of the proposed scheme compared to conventional feature normalization schemes. Additional experiments performed on the newly released noisy robust open set speaker identification (ROSSI) and NIST SRE corpora also confirm the effectiveness of the proposed method in stationary room/channel mismatched conditions.
Bibliographic reference. Hasan, Taufiq / Hansen, John H. L. (2011): "Robust speaker recognition in non-stationary room environments based on empirical mode decomposition", In INTERSPEECH-2011, 2733-2736.