13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Average Spectrotemporal Structure of Continuous Speech Matches with the Frequency Resolution of Human Hearing

Okko Räsänen

Department of Signal Processing and Acoustics, School of Electrical Engineering, Aalto University, Espoo, Finland

The main goal of the auditory system is to detect and identify incoming sound patterns that are distributed in time and frequency. Since a priori knowledge of the spectrotemporal structure of these patterns is not available, the optimal strategy for the auditory system is to integrate incoming signals in frequency and time according to the average spectrotemporal structure of ecologically relevant stimuli. In the current work, we measure the average spectrotemporal dependencies of continuous speech and show that the dependency structure can be interpreted as an optimal filter matched to the structure of speech, and that the characteristics of the obtained filters are notably similar to the critical bands of human hearing. This result provides further evidence that speech and the auditory system are matched for optimal signaling performance and that the dependency structure is learnable with a single Hebbian-like learning mechanism.

Index Terms: speech perception, auditory perception, statistical learning, sensory plasticity

Full Paper

Bibliographic reference.  Räsänen, Okko (2012): "Average spectrotemporal structure of continuous speech matches with the frequency resolution of human hearing", In INTERSPEECH-2012, 1444-1447.