A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement

Minh Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang


In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as well as spectral information, it is challenging to effectively train a multi-channel deep learning system in an end-to-end framework. With a channel-independent encoding architecture for spectral estimation and a strategy to extract spatial information through an inter-channel attention mechanism, we implement a multi-channel speech enhancement system that has high performance even in reverberant and extremely noisy environments. Experimental results show that the proposed architecture has superior performance in terms of signal-to-distortion ratio improvement (SDRi), short-time objective intelligence (STOI), and phoneme error rate (PER) for speech recognition.


 DOI: 10.21437/Interspeech.2020-2548

Cite as: Ho, M.T., Lee, J., Lee, B., Yi, D.H., Kang, H. (2020) A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement. Proc. Interspeech 2020, 4049-4053, DOI: 10.21437/Interspeech.2020-2548.


@inproceedings{Ho2020,
  author={Minh Tri Ho and Jinyoung Lee and Bong-Ki Lee and Dong Hoon Yi and Hong-Goo Kang},
  title={{A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4049--4053},
  doi={10.21437/Interspeech.2020-2548},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2548}
}