13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

A New Noise-tracking Algorithm for Generalizing Binary Time-frequency (T-F) Masking to Ratio Masking

Shan Liang, Wei Jiang, Wenju Liu

National Laboratory of Pattern Recognition, Institute of Automation Chinese Academy of Sciences, Beijing, China

In this paper, we attempt to generalize the ideal binary mask (IBM) estimation to the ideal ratio mask (IRM) estimation. Under binary masking, the error in IBM estimation may greatly distort the original speech spectrum. The main purpose of this paper is using ratio mask to smooth this negative impact. Since the key issue is the noise tracking, we firstly use exponential distributions to model the distribution of noise power with binary mask and mixture power as condition. Then, we use a Gaussian distribution to model the correlation of noise estimation between adjacent T-F units. As the IBM of majority units can be estimated correctly, the correlation model could reduce the impact introduced by the error in IBM estimation. Systematic experiments show that our algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

Index Terms: Ideal Binary Mask, Ideal Ratio Mask, Markov Chain Monte Carlo, Bayesian rule

Full Paper

Bibliographic reference.  Liang, Shan / Jiang, Wei / Liu, Wenju (2012): "A new noise-tracking algorithm for generalizing binary time-frequency (t-f) masking to ratio masking", In INTERSPEECH-2012, 951-954.