Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact

Wei Xia, John H.L. Hansen

Speech, speaker and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an overwhelming abundance of publicly accessible audio material available for training. A major challenge, however, is that such found data is not professionally recorded and therefore may contain a wide diversity of background noise, nonlinear distortions, or other unknown environmental based contamination or mismatch. There is a critical need for automatic analysis to screen such unknown data sets before acoustic model development, or to perform input audio purity screening prior to classification. In this study, we propose a waveform based clipping detection algorithm for naturalistic audio streams and analyze the impact of clipping at different severities on speech quality measures and automatic speaker recognition systems. We use the TIMIT and NIST SRE-08 corpora as case studies. The results show, as expected, that clipping introduces a nonlinear distortion into clean speech data, which reduces both speech quality and speaker recognition performance. We also investigate what degree of clipping can be present to sustain effective speech system performance. The proposed detection system, which will be released, could contribute to massive new audio collections for speech and language technology development.

 DOI: 10.21437/Interspeech.2018-2430

Cite as: Xia, W., Hansen, J.H. (2018) Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact. Proc. Interspeech 2018, 746-750, DOI: 10.21437/Interspeech.2018-2430.

  author={Wei Xia and John H.L. Hansen},
  title={Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact},
  booktitle={Proc. Interspeech 2018},