A spectro-temporal modulation based singing voice detection cascaded with a Viterbi based pitch tracking algorithm is proposed in this paper for singing-voice separation from monaural recordings. To detect the singing voice, the spectro-temporal modulation energy related to voice harmonics is extracted using a spectro-temporal modulation analysis framework developed for the Fourier spectrogram. Separation of singing-voice from background music is conducted using a binary mask to group estimated harmonics of singing voice. The proposed system is evaluated using MIR-1K dataset and is shown outperforming three other binary-mask based systems in the vocal/music separation task.
Bibliographic reference. Lin, Tse-En / Hsu, Chung-Chien / Chen, Yi-Cheng / Chen, Jian-Hueng / Chi, Tai-Shih (2013): "Spectro-temporal modulation based singing detection combined with pitch-based grouping for singing voice separation", In INTERSPEECH-2013, 2920-2923.