EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Multi-Stream Statistical N-Gram Modeling with Application to Automatic Language Identification

Katrin Kirchhoff, Sonia Parandekar

University of Washington, USA

Most state-of-the art automatic language identification systems are based on phonotactic information, i.e. languages are identified on the basis of probabilities of phone sequences extracted from the acoustic signal. This approach ignores the potential advantages to be gained from a richer representation of the acoustic signal in terms of parallel streams of subphonemic events. In this paper we develop an alternative approach to language identification which is based on parallel streams of phonetic features and sparse modeling of statistical dependencies between these streams. We present results on the OGI-TS database and show that the feature-based system outperforms a comparable phone-based system significantly while using fewer parameters. Moreover, the feature-based system exhibits a markedly better performance on very short test signals (< 3 seconds).

Full Paper

Bibliographic reference.  Kirchhoff, Katrin / Parandekar, Sonia (2001): "Multi-stream statistical n-gram modeling with application to automatic language identification", In EUROSPEECH-2001, 803-806.