Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Noise Update Modeling for Speech Enhancement: When Do We Do Enough?

Nitish Krishnamurthy, John H. L. Hansen

University of Texas at Dallas, USA

In speech enhancement, it is generally assumed that if you can update your noise estimate on a frame-by-frame basis, you should achieve the highest level of enhancement performance. However, for many noise types and environmental conditions, it is not necessary to perform an update on a frame-by-frame basis to achieve superior performance if the noise structure does not change rapidly. For applications where compute/memory resources are limited, better overall speech performance could be achieved if a more reasonable update rate is estimated so that available compute/memory resources could be made available to the enhancement algorithm itself. In this study, we propose a framework to model the noise structure with the goal of determining the best update rate required to achieve a given quality for speech enhancement. Speech systems generally develop specialized solutions for noise which are unique to each application (i.e., recognition, speaker ID, enhancement etc.). Here we propose a model to predict the noise update rate required to achieve a given quality for enhancement. We evaluate the algorithm across a corpus of four noise types under different levels of degradation. The error between the mean observed and the mean predicted Itakuta-Saito (IS) values of quality are typically between 0.06 to 1.78 IS for our model selected noise frame update rate of 1 frame every 5 frames using the Log-MMSE enhancement scheme. Finally we consider mobile and resource limited applications where such a framework would be useful.

Full Paper

Bibliographic reference.  Krishnamurthy, Nitish / Hansen, John H. L. (2006): "Noise update modeling for speech enhancement: when do we do enough?", In INTERSPEECH-2006, paper 1396-Tue3FoP.6.