Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition

Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan

Insufficient spectral smoothing during front-end speech parametrization results in pitch-induced distortions in the short-time magnitude spectra. This, in turn, degrades the performance of an automatic speech recognition (ASR) system for high-pitched speakers. Motivated by this fact, a non-uniform spectral smoothing algorithm is proposed in this paper in order to mitigate the acoustic mismatch resulting from pitch differences. In the proposed technique, the speech utterance is first segmented into vowel and non-vowel regions. The short-time magnitude spectrum obtained by discrete Fourier transform is then processed through a single-pole low-pass filter with different pole values for vowel and non-vowel regions. Sufficiently smoothed spectra is obtained by keeping higher values for the pole in the case of vowels while lower values are chosen for non-vowel regions. The Mel-frequency cepstral coefficients computed using the derived smoothed spectra are observed to be less affected by pitch variations. In order to validate this claim, an ASR system is developed on speech from adult speakers and evaluated on a test set which consists of children's speech to simulate large pitch differences. The experimental evaluations as well as signal domain analyses presented in this paper support the claim.

 DOI: 10.21437/Interspeech.2018-1828

Cite as: Yadav, I.C., Kumar, A., Shahnawazuddin, S., Pradhan, G. (2018) Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition. Proc. Interspeech 2018, 1601-1605, DOI: 10.21437/Interspeech.2018-1828.

  author={Ishwar Chandra Yadav and Avinash Kumar and Syed Shahnawazuddin and Gayadhar Pradhan},
  title={Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition},
  booktitle={Proc. Interspeech 2018},