An Alternative to MFCCs for ASR

Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur


The Mel scale is the most commonly used frequency warping function to extract features for automatic speech recognition (ASR) and is known to be quite effective. However, it is not specifically designed for ASR acoustic models based on deep neural networks (DNN). In this study, we introduce a frequency warping function which is a modified version of Mel scale. This warping function is parameterized using 2 parameters and we use it to propose a new set of features called modified Mel-frequency cepstral coefficients (MFCC), which use cosine-shaped filters. The bandwidths are computed using a new function. By evaluating the proposed features on a variety of ASR data sets, we see consistent improvements over regular MFCCs and (log) Mel filter bank energies.


 DOI: 10.21437/Interspeech.2020-2690

Cite as: Ghahramani, P., Hadian, H., Povey, D., Hermansky, H., Khudanpur, S. (2020) An Alternative to MFCCs for ASR. Proc. Interspeech 2020, 1664-1667, DOI: 10.21437/Interspeech.2020-2690.


@inproceedings{Ghahramani2020,
  author={Pegah Ghahramani and Hossein Hadian and Daniel Povey and Hynek Hermansky and Sanjeev Khudanpur},
  title={{An Alternative to MFCCs for ASR}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1664--1667},
  doi={10.21437/Interspeech.2020-2690},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2690}
}