13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Developing a Speech Activity Detection System for the DARPA RATS Program

Tim Ng (1), Bing Zhang (1), Long Nguyen (1), Spyros Matsoukas (1), Xinhui Zhou (2), Nima Mesgarani (2), Karel Veselý (3), Pavel Matějka (3)

(1) Raytheon BBN Technologies, Cambridge, MA, USA
(2) University of Maryland, College Park, MD, USA
(3) Brno University of Technology, Brno, Czech Republic

This paper describes the speech activity detection (SAD) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present two approaches to SAD, one based on Gaussian mixture models, and one based on multi-layer perceptrons. We show that significant gains in SAD accuracy can be obtained by careful design of acoustic front end, feature normalization, incorporation of long span features via data-driven dimensionality reducing transforms, and channel dependent modeling. We also present a novel technique for normalizing detection scores from different systems for the purpose of system combination.

Index Terms: speech activity detection, noisy speech

Full Paper

Bibliographic reference.  Ng, Tim / Zhang, Bing / Nguyen, Long / Matsoukas, Spyros / Zhou, Xinhui / Mesgarani, Nima / Veselý, Karel / Matějka, Pavel (2012): "Developing a speech activity detection system for the DARPA RATS program", In INTERSPEECH-2012, 1969-1972.