13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Mask Estimation and Refinement for MFT-based Robust Speaker Verification

Yali Zhao, Lie Xie, Zhonghua Fu

Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi'an, China

Missing feature theory (MFT) has been proposed to effectively improve speaker recognition performance in noisy environments. For MFTbased speaker recognition, the binary mask is required to identify those reliable and unreliable feature components. In this paper, a dualmicrophone based semi-blind Degenerate Unmixing Estimation Technique (DUET) approach is proposed to estimate the binary mask. Using the spatial information instead of the conventional statistics of noises, our proposed approach has a good mask estimation, especially when the noises are non-stationary, e.g., interfering speech or music. Experimental results show that the proposed method achieve significant improvements over alternative approaches. We further refine the estimated binary mask by removing the unreliable time frames and nondiscriminate frequency subbands. Experiments demonstrate that the refined binary mask enhances the performance of MFT-based speaker verification, and represents a promising dire ction for MFT-based applications.

Index Terms: speaker verification, missing feature theory, dual-microphone, binary mask estimation

Full Paper

Bibliographic reference.  Zhao, Yali / Xie, Lie / Fu, Zhonghua (2012): "Mask estimation and refinement for MFT-based robust speaker verification", In INTERSPEECH-2012, 2654-2657.