13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition

Antti Hurmalainen (1), Rahim Saeidi (2), Tuomas Virtanen (2)

(1) Department of Signal Processing, Tampere University of Technology, Tampere, Finland
(2) Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands

Spectrogram factorisation using a dictionary of spectro-temporal atoms has been successfully employed to separate a mixed audio signal into its source components. When atoms from multiple sources are included in a combined dictionary, the relative weights of activated atoms reveal likely sources as well as the content of each source. Enforcing sparsity on the activation weights produces solutions, where only a small number of atoms are active at a time. In this paper we propose using group sparsity to restrict simultaneous activation of sources, allowing us to discover the identity of an unknown speaker from multiple candidates, and further to recognise the phonetic content more reliably with a narrowed down subset of atoms belonging to the most likely speakers. An evaluation on the CHiME corpus shows that the use of group sparsity improves the results of noise robust speaker identification and speech recognition using speaker-dependent models.

Index Terms: group sparsity, speech recognition, speaker identification, spectrogram factorization

Full Paper

Bibliographic reference.  Hurmalainen, Antti / Saeidi, Rahim / Virtanen, Tuomas (2012): "Group sparsity for speaker identity discrimination in factorisation-based speech recognition", In INTERSPEECH-2012, 2138-2141.