Out of Set Language Modelling in Hierarchical Language Identification

Saad Irtza, Vidhyasaharan Sethu, Sarith Fernando, Eliathamby Ambikairajah, Haizhou Li

This paper proposes a novel approach to the open set language identification task by introducing out of set (OOS) language modelling in a Hierarchical Language Identification (HLID) framework. Most recent language identification systems make use of data sources from other than target languages to model OOS languages. The proposed approach does not require such data to model OOS languages, instead it only uses data from target languages. Additionally, a diverse language selection method is incorporated to further improve OOS language modelling. This work also proposes the use of a new training data selection method to develop compact models in a hierarchical framework. Experiments are conducted on the recent NIST LRE 2015 data set. The overall results show relative improvements of 32.9% and 30.1% in terms of Cavg with and without the diverse language selection method respectively over the corresponding baseline systems, when using the proposed hierarchical OOS modelling.

DOI: 10.21437/Interspeech.2016-558

Cite as

Irtza, S., Sethu, V., Fernando, S., Ambikairajah, E., Li, H. (2016) Out of Set Language Modelling in Hierarchical Language Identification. Proc. Interspeech 2016, 3270-3274.

author={Saad Irtza and Vidhyasaharan Sethu and Sarith Fernando and Eliathamby Ambikairajah and Haizhou Li},
title={Out of Set Language Modelling in Hierarchical Language Identification},
booktitle={Interspeech 2016},