Robust Mizo Continuous Speech Recognition

Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha, Nirmala S.R.

Mizo is an under-resourced tonal language that is mainly spoken in North-East India. It has 4 canonical tones along with a tone-sandhi. In Mizo language, a majority of the words contain tone information. As a result of that, it exhibits higher acoustic variability like other tonal languages in the world. In this work, we investigate the impact of tonal information on robust Mizo continuous speech recognition (CSR). First, separate baseline CSR systems are developed employing the Mel-frequency cepstral coefficient (MFCC) based acoustic features and salient acoustic modeling paradigms. For further improvement, the tonal information has been incorporated in each of the CSR systems. For this purpose, 3-dimensional tonal features are derived which include pitch, pitch-difference and probability of voicing values. Our experimental study reveals that with the inclusion of tonal information, the robustness of Mizo CSR system gets enhanced across all acoustic modeling paradigms. This trend is attributed to lesser degradation in the fundamental frequency information than the vocal tract information under noisy conditions.

 DOI: 10.21437/Interspeech.2018-2125

Cite as: Dey, A., Sarma, B.D., Lalhminghlui, W., Ngente, L., Gogoi, P., Sarmah, P., Prasanna, S.R.M., Sinha, R., S.R., N. (2018) Robust Mizo Continuous Speech Recognition. Proc. Interspeech 2018, 1036-1040, DOI: 10.21437/Interspeech.2018-2125.

  author={Abhishek Dey and Biswajit Dev Sarma and Wendy Lalhminghlui and Lalnunsiami Ngente and Parismita Gogoi and Priyankoo Sarmah and S R Mahadeva Prasanna and Rohit Sinha and Nirmala {S.R.}},
  title={Robust Mizo Continuous Speech Recognition},
  booktitle={Proc. Interspeech 2018},