Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario

S. Shahnawazuddin, Nagaraj Adiga, Kunal Kumar, Aayushi Poddar, Waquar Ahmad


Automatic recognition of children’s speech is a challenging research problem due to several reasons. One among those is unavailability of large amounts of speech data from child speakers to develop automatic speech recognition (ASR) systems employing deep learning architectures. Using a limited amount of training data limits the power of the learned system. To overcome this issue, we have explored means to effectively make use of adults’ speech data for training an ASR system. For that purpose, generative adversarial network (GAN) based voice conversion (VC) is exploited to modify the acoustic attributes of adults’ speech making it perceptually similar to that of children’s speech. The original and converted speech samples from adult speakers are then pooled together to learn the statistical model parameters. Significantly improved recognition rate for children’s speech is noted due to VC-based data augmentation. To further enhance the recognition rate, a limited amount of children’s speech data is also pooled into training. Large reduction in error rate is observed in this case as well. It is worth mentioning that GAN-based VC does not change the speaking-rate. To demonstrate the need to deal with speaking-rate differences we report the results of time-scale modification of children’s speech test data.


 DOI: 10.21437/Interspeech.2020-1112

Cite as: Shahnawazuddin, S., Adiga, N., Kumar, K., Poddar, A., Ahmad, W. (2020) Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario. Proc. Interspeech 2020, 4382-4386, DOI: 10.21437/Interspeech.2020-1112.


@inproceedings{Shahnawazuddin2020,
  author={S. Shahnawazuddin and Nagaraj Adiga and Kunal Kumar and Aayushi Poddar and Waquar Ahmad},
  title={{Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4382--4386},
  doi={10.21437/Interspeech.2020-1112},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1112}
}