Phase Based Spectro-Temporal Features for Building a Robust ASR System

Anirban Dutta, G. Ashishkumar, Ch.V. Rama Rao


Spectro-temporal feature extraction has shown its robustness in the field of speech recognition. However, these features are derived from magnitude spectrum of the complex Fourier Transform (FT). In this work, we investigate to see if phase information can substitute magnitude based spectro-temporal features. We compared with different state of art phase spectrum and evaluated its performance. The experiments are carried out in different noisy environments. We found Modified Group Delay (MODGD) spectrum to closely resemble the structure of power spectrum. A relative performance difference of 0.03% on average is observed for the MODGD spectro-temporal features compared to the magnitude based features. The analysis showed that phase can indeed carry equivalent or complementary information to magnitude based spectro-temporal features.


 DOI: 10.21437/Interspeech.2020-2258

Cite as: Dutta, A., Ashishkumar, G., Rao, C.R. (2020) Phase Based Spectro-Temporal Features for Building a Robust ASR System. Proc. Interspeech 2020, 1668-1672, DOI: 10.21437/Interspeech.2020-2258.


@inproceedings{Dutta2020,
  author={Anirban Dutta and G. Ashishkumar and Ch.V. Rama Rao},
  title={{Phase Based Spectro-Temporal Features for Building a Robust ASR System}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1668--1672},
  doi={10.21437/Interspeech.2020-2258},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2258}
}