A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge

Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee


We propose a space-and-speaker-aware iterative mask estimation (SSA-IME) approach to improving complex angular central Gaussian distributions (cACGMM) based beamforming in an iterative manner by leveraging upon the complementary information obtained from SSA-based regression. First, a mask calculated by beamformed speech features is proposed to enhance the estimation accuracy of the ideal ratio mask from noisy speech. Second, the outputs of cACGMM-beamformed speech with given time annotation as initial values are used to extract the log-power spectral and inter-phase difference features of different speakers serving as inputs to estimate the regression-based SSA model. Finally, in decoding, the mask estimated by the SSA model is also used to iteratively refine cACGMM-based masks, yielding enhanced multi-array speech. Tested on the recent CHiME-6 Challenge Track 1 tasks, the proposed SSA-IME framework significantly and consistently outperforms state-of-the-art approaches, and achieves the lowest word error rates for both Track 1 speech recognition tasks.


 DOI: 10.21437/Interspeech.2020-2150

Cite as: Tu, Y., Du, J., Sun, L., Ma, F., Pan, J., Lee, C. (2020) A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge. Proc. Interspeech 2020, 96-100, DOI: 10.21437/Interspeech.2020-2150.


@inproceedings{Tu2020,
  author={Yan-Hui Tu and Jun Du and Lei Sun and Feng Ma and Jia Pan and Chin-Hui Lee},
  title={{A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={96--100},
  doi={10.21437/Interspeech.2020-2150},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2150}
}