Time Delay Histogram Based Speech Source Separation Using a Planar Array

Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan


Bin-wise time delay is a valuable clue to form the time-frequency (TF) mask for speech source separation on the two-microphone array. On widely spaces microphones, however, the time delay estimation suffers from spatial aliasing. Although histogram is a simple and effective method to tackle the problem of spatial aliasing, it can not be directly applied on planar arrays. This paper proposes a histogram-based method to separate multiple speech sources on the arbitrary-size planar array, where the spatial aliasing is resisted. Time delay histogram is firstly utilized to estimate the delays of multiple sources on each microphone pair. The estimated delays on all pairs are then incorporated into an azimuth histogram by means of the pairwise combination test. From the azimuth histogram, the direction-of-arrivals (DOAs) and the number of sources are obtained. Eventually, the TF mask is determined based on the estimated DOAs. Some experiments were conducted under various conditions, confirming the superiority of the proposed method.


 DOI: 10.21437/Interspeech.2017-55

Cite as: Huang, Z., Cao, Z., Ying, D., Pan, J., Yan, Y. (2017) Time Delay Histogram Based Speech Source Separation Using a Planar Array. Proc. Interspeech 2017, 1879-1883, DOI: 10.21437/Interspeech.2017-55.


@inproceedings{Huang2017,
  author={Zhaoqiong Huang and Zhanzhong Cao and Dongwen Ying and Jielin Pan and Yonghong Yan},
  title={Time Delay Histogram Based Speech Source Separation Using a Planar Array},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1879--1883},
  doi={10.21437/Interspeech.2017-55},
  url={http://dx.doi.org/10.21437/Interspeech.2017-55}
}