Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Sparseness and Speech Perception in Noise

Guoping Li, Mark E. Lutman

University of Southampton, UK

Can we model speech recognition in noise by exploring higher order statistics of the combined signal? How will changes in these statistics affect speech perception in noise? This study addresses these questions in two experiments. One investigated the relationship between an established "glimpsing" model and the fourth order statistic, kurtosis. The glimpsing model [1] proposes that listeners can explore the local speech-to-noise ratio (SNR) in short time segments (glimpses) and focus on areas where SNR is high. Results showed that there is a very high correlation between percentages of glimpsing area and kurtosis (r = 0.99;p < 0.01), suggesting that kurtosis can serve as a simpler index for measuring glimpsing. The experiment also examined the association between kurtosis and recognition of nonsense words (vowel-consonant-vowel, VCV) in babble modulated noise, also showing very high correlation (r = 0.97;p < 0.01). Another separate study focused on the relationship of sparseness to speech recognition score for VCV words in natural babble noise made of 100 people talking simultaneously [2]. Results show that there is also high correlation between kurtosis and speech recognition score with this noise. Logistic regression analysis to obtain the kurtosis for 50% correct showed this was achieved at a kurtosis of approximately 1.0.

Full Paper

Bibliographic reference.  Li, Guoping / Lutman, Mark E. (2006): "Sparseness and speech perception in noise", In INTERSPEECH-2006, paper 1466-Wed3FoP.9.