Robust Mizo digit recognition using data augmentation and tonal information

Biswajit Dev Sarma, Abhishek Dey, Wendy Lalhminghlui, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna


Performance of speech recognition system severely degrades in noisy environment. Considering this, in this work, we present a method to improve performance of a Mizo digit recognition system in different noisy conditions using data augmentation and tonal information. Mizo is a tonal language and each digit in Mizo is spoken with one of the four tones present in the language. Therefore, the tone contains information about the spoken digit. Tone is related to the excitation source and excitation source information is robust to noisy conditions when compared with the vocal tract information. Normalized cross correlation function, pitch and pitch dynamics are used as additional features to represent the tonal information and improvement is achieved in Mel frequency cepstral coefficient (MFCC) based baseline systems in noisy conditions. Data augmentation is another technique used in the literature for robust speech recognition. Use of data augmentation further improves the performance of the Mizo digit recognition.


 DOI: 10.21437/SpeechProsody.2018-126

Cite as: Sarma, B.D., Dey, A., Lalhminghlui, W., Gogoi, P., Sarmah, P., Prasanna, S.R.M. (2018) Robust Mizo digit recognition using data augmentation and tonal information. Proc. 9th International Conference on Speech Prosody 2018, 621-625, DOI: 10.21437/SpeechProsody.2018-126.


@inproceedings{Sarma2018,
  author={Biswajit Dev Sarma and Abhishek Dey and Wendy Lalhminghlui and Parismita Gogoi and Priyankoo Sarmah and S R Mahadeva Prasanna},
  title={Robust Mizo digit recognition using data augmentation and tonal information},
  year=2018,
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},
  pages={621--625},
  doi={10.21437/SpeechProsody.2018-126},
  url={http://dx.doi.org/10.21437/SpeechProsody.2018-126}
}