Attention Based Hybrid i-Vector BLSTM Model for Language Recognition

Bharat Padi, Anand Mohan, Sriram Ganapathy

In this paper, a hybrid i-vector neural network framework (i-BLSTM) which models the sequence information present in a series of short segment i-vectors for the task of spoken language recognition (LRE) is proposed. A sequence of short segment i-vectors are extracted for every speech utterance and are then modeled using a bidirectional long short-term memory (BLSTM) recurrent neural network (RNN). Attention mechanism inside the neural network relevantly weights segments of the speech utterance and the model learns to give higher weights to parts of speech data which are more helpful to the classification task. The proposed framework performs better in short duration and noisy environments when compared with the conventional i-vector system. Experiments are performed on clean, noisy and multi-speaker speech data from NIST LRE 2017 and RATS language recognition corpus. In these experiments, the proposed approach yields significant improvements (relative improvements of 7.6–13% in terms of accuracy for noisy conditions) over the conventional i-vector based language recognition approach and also over an end-to-end LSTM-RNN based approach.

 DOI: 10.21437/Interspeech.2019-2371

Cite as: Padi, B., Mohan, A., Ganapathy, S. (2019) Attention Based Hybrid i-Vector BLSTM Model for Language Recognition. Proc. Interspeech 2019, 1263-1267, DOI: 10.21437/Interspeech.2019-2371.

  author={Bharat Padi and Anand Mohan and Sriram Ganapathy},
  title={{Attention Based Hybrid i-Vector BLSTM Model for Language Recognition}},
  booktitle={Proc. Interspeech 2019},