A Signal Processing Approach for Speaker Separation Using SFF Analysis

Nivedita Chennupati, B.H.V.S. Narayana Murthy, B. Yegnanarayana


Multi-speaker separation is necessary to increase intelligibility of speech signals or to improve accuracy of speech recognition systems. Ideal binary mask (IBM) has set a gold standard for speech separation by suppressing the undesired speakers and also by increasing intelligibility of the desired speech. In this work, single frequency filtering (SFF) analysis is used to estimate the mask closer to IBM for speaker separation. The SFF analysis gives good temporal resolution for extracting features such as glottal closure instants (GCIs), and high spectral resolution for resolving harmonics. The temporal resolution in SFF gives impulse locations, which are used to calculate the time delay. The delay compensation between two microphone signals reinforces the impulses corresponding to one of the speakers. The spectral resolution of the SFF is exploited to estimate the masks using the SFF magnitude spectra on the enhanced impulse-like sequence corresponding to one of the speakers. The estimated mask is used to refine the SFF magnitude. The refined SFF magnitude along with the phase of the mixed microphone signal is used to obtain speaker separation. Performance of proposed algorithm is demonstrated using multi-speaker data collected in a real room environment.


Cite as: Chennupati, N., Murthy, B.N., Yegnanarayana, B. (2017) A Signal Processing Approach for Speaker Separation Using SFF Analysis. Proc. Interspeech 2017, 2034-2035.


@inproceedings{Chennupati2017,
  author={Nivedita Chennupati and B.H.V.S. Narayana Murthy and B. Yegnanarayana},
  title={A Signal Processing Approach for Speaker Separation Using SFF Analysis},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2034--2035}
}