Speech Localisation in a Multitalker Mixture by Humans and Machines

Ning Ma, Guy J. Brown

Speech localisation in multitalker mixtures is affected by the listener’s expectations about the spatial arrangement of the sound sources. This effect was investigated via experiments with human listeners and a machine system, in which the task was to localise a female-voice target among four spatially distributed male-voice maskers. Two configurations were used: either the masker locations were fixed or the locations varied from trial-to-trial. The machine system uses deep neural networks (DNNs) to learn the relationship between binaural cues and source azimuth, and exploits top-down knowledge about the spectral characteristics of the target source. Performance was examined in both anechoic and reverberant conditions. Our experiments show that the machine system outperformed listeners in some conditions. Both the machine and listeners were able to make use of a priori knowledge about the spatial configuration of the sources, but the effect for headphone listening was smaller than that previously reported for listening in a real room.

DOI: 10.21437/Interspeech.2016-1149

Cite as

Ma, N., Brown, G.J. (2016) Speech Localisation in a Multitalker Mixture by Humans and Machines. Proc. Interspeech 2016, 3359-3363.

author={Ning Ma and Guy J. Brown},
title={Speech Localisation in a Multitalker Mixture by Humans and Machines},
booktitle={Interspeech 2016},