Analysis of sparse representation based feature on speech mode classification

Kumud Tripathi, K. Sreenivasa Rao

Traditional phone recognition systems are developed using read speech. But, in reality, the speech that needs to be processed by machine is not always in read mode. Therefore to handle the phone recognition in realistic scenarios, three broad modes of speech: read, conversation and extempore are considered in this study. The conversation mode includes informal communication in an unconstrained environment between two or more individuals. In the extempore mode, a person speaks with confidence without the help of notes. Read mode is a formal type of speech in a rigid environment. In this work, we have proposed a sparse based feature for speech mode classification. The effectiveness of sparse representation depends on the dictionary. Therefore, we have learned multiple overcomplete dictionaries by using parallel atom-update dictionary learning (PAU-DL) technique to capture the discrimination characteristics present in the considered speech modes. Further, sparse features correspond to the sequence of speech frames are derived using the learned dictionary by applying the orthogonal matching pursuit (OMP) algorithm. The proposed sparse features are evaluated on speech corpora consisting of six Indian languages by performing classification of speech modes. The results with the proposed sparse features outperform the standard spectral, excitation source and prosodic features.

 DOI: 10.21437/Interspeech.2018-1921

Cite as: Tripathi, K., Rao, K.S. (2018) Analysis of sparse representation based feature on speech mode classification. Proc. Interspeech 2018, 731-735, DOI: 10.21437/Interspeech.2018-1921.

  author={Kumud Tripathi and K. Sreenivasa Rao},
  title={Analysis of sparse representation based feature on speech mode classification},
  booktitle={Proc. Interspeech 2018},