INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Spectro-Temporal Modulation Based Singing Detection Combined with Pitch-Based Grouping for Singing Voice Separation

Tse-En Lin (1), Chung-Chien Hsu (1), Yi-Cheng Chen (2), Jian-Hueng Chen (2), Tai-Shih Chi (1)

(1) National Chiao Tung University, Taiwan
(2) Chunghwa Telecom Co. Ltd., Taiwan

A spectro-temporal modulation based singing voice detection cascaded with a Viterbi based pitch tracking algorithm is proposed in this paper for singing-voice separation from monaural recordings. To detect the singing voice, the spectro-temporal modulation energy related to voice harmonics is extracted using a spectro-temporal modulation analysis framework developed for the Fourier spectrogram. Separation of singing-voice from background music is conducted using a binary mask to group estimated harmonics of singing voice. The proposed system is evaluated using MIR-1K dataset and is shown outperforming three other binary-mask based systems in the vocal/music separation task.

Full Paper

Bibliographic reference.  Lin, Tse-En / Hsu, Chung-Chien / Chen, Yi-Cheng / Chen, Jian-Hueng / Chi, Tai-Shih (2013): "Spectro-temporal modulation based singing detection combined with pitch-based grouping for singing voice separation", In INTERSPEECH-2013, 2920-2923.