Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li


Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.


 DOI: 10.21437/Interspeech.2020-1436

Cite as: Li, T., Lin, Q., Bao, Y., Li, M. (2020) Atss-Net: Target Speaker Separation via Attention-Based Neural Network. Proc. Interspeech 2020, 1411-1415, DOI: 10.21437/Interspeech.2020-1436.


@inproceedings{Li2020,
  author={Tingle Li and Qingjian Lin and Yuanyuan Bao and Ming Li},
  title={{Atss-Net: Target Speaker Separation via Attention-Based Neural Network}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1411--1415},
  doi={10.21437/Interspeech.2020-1436},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1436}
}