Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li

SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.

 DOI: 10.21437/Interspeech.2019-1474

Cite as: Li, G., Liang, S., Nie, S., Liu, W., Yu, M., Chen, L., Peng, S., Li, C. (2019) Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. Proc. Interspeech 2019, 2713-2717, DOI: 10.21437/Interspeech.2019-1474.

  author={Guanjun Li and Shan Liang and Shuai Nie and Wenju Liu and Meng Yu and Lianwu Chen and Shouye Peng and Changliang Li},
  title={{Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction}},
  booktitle={Proc. Interspeech 2019},