13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity

Ryo Yokoyama (1), Yu Nasu (1), Koichi Shinoda (1), Koji Iwano (2)

(1) Department of Computer Science, Tokyo Institute of Technology, Japan
(2) Faculty of Environmental and Information Studies, Tokyo City University, Japan

We propose an overlapped speech detection method for speech recognition and speaker diarization of meetings, where each speaker wears a lapel microphone. Two novel features are utilized as inputs for a GMM-based detector. One is speech power after cross-channel spectral subtraction which reduces the power from the other speakers. The other is an amplitude spectral cosine correlation coefficient which effectively extracts the correlation of spectral components in a rather quiet condition. We evaluated our method using a meeting speech corpus of four persons. The accuracy of our proposed method, 74.1%, was significantly better than that of the conventional method, 67.0%, which uses raw speech power and power spectral Pearson's correlation coefficient.

Index Terms: overlap speech detection, spectral subtraction, cosine distance

Full Paper

Bibliographic reference.  Yokoyama, Ryo / Nasu, Yu / Shinoda, Koichi / Iwano, Koji (2012): "Overlapped speech detection in meeting using cross-channel spectral subtraction and spectrum similarity", In INTERSPEECH-2012, 1500-1503.