Towards Feature-space Emotional Speech Adaptation for TDNN based Telugu ASR systems

Vishnu Vidyadhara Raju V, Krishna Gurugubelli, Mirishkar Sai Ganesh, Anil Kumar Vuppala


´╗┐The unavailability of speech corpora is one of the critical barriers for building a large vocabulary naturalistic Telugu automatic speech recognition (ASR) system. Hence, an effort is put towards the collection of both neutral and emotional speech samples created as Telugu naturalistic emotional speech corpus(IIIT-H TNESC). In this work, we investigate the feature-space adaptation approach to compensate the acoustic mismatch between neutral and emotional speech by using auxiliary features. The features derived from the maximum likelihood linear regression (fMLLR) of GMM models are used to perform the feature-space adaptation. The effectiveness of this adaptation is studied on deep neural network (DNN), time-delay neural network (TDNN) and combined TDNN with Long short-term memory (TDNN-LSTM) based acoustic models. Experimental results show that the feature-space adaptation approach has improved the performance of baseline by an average word error rate of 15.8%


 DOI: 10.21437/SMM.2019-4

Cite as: V, V.V.R., Gurugubelli, K., Ganesh, M.S., Vuppala, A.K. (2019) Towards Feature-space Emotional Speech Adaptation for TDNN based Telugu ASR systems. Proc. SMM19, Workshop on Speech, Music and Mind 2019, 16-20, DOI: 10.21437/SMM.2019-4.


@inproceedings{V2019,
  author={Vishnu Vidyadhara Raju V and Krishna Gurugubelli and Mirishkar Sai Ganesh and Anil Kumar Vuppala},
  title={{Towards Feature-space Emotional Speech Adaptation for TDNN based Telugu ASR systems}},
  year=2019,
  booktitle={Proc. SMM19, Workshop on Speech, Music and Mind 2019},
  pages={16--20},
  doi={10.21437/SMM.2019-4},
  url={http://dx.doi.org/10.21437/SMM.2019-4}
}