Odyssey 2012 - The Speaker and Language Recognition Workshop

June 25-28, 2012

Exploring the Impact of Advanced Front-End Processing on NIST Speaker Recognition Microphone Tasks

William M. Campbell, Doug Sturim, Bengt Jonas Borgström, Robert Dunn, Alan McCree, Thomas F. Quatieri, Douglas A. Reynolds

MIT Lincoln Laboratory, Lexington, MA, USA

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data— multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancment, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.

Full Paper

Bibliographic reference.  Campbell, William M. / Sturim, Doug / Borgström, Bengt Jonas / Dunn, Robert / McCree, Alan / Quatieri, Thomas F. / Reynolds, Douglas A. (2012): "Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks", In Odyssey-2012, 180-186.