Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Language Identification Using Wavelets

Ana Lilia Reyes-Herrera, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez

INAOE, Mexico

Spoken language identification consists in recognizing a language based on a sample of speech from an unknown speaker. The traditional approach for this task mainly considers the phonothactic information of languages. However, for marginalized languages - languages with few speakers or oral languages without a fixed writing standard -, this information is practically not at hand and consequently the usual approach is not applicable. In this paper, we present a method that only considers the acoustic features of the speech signal and does not use any kind of linguistic information. This method applies a wavelet transform to extract the acoustic features of the speech signal. The experimental results on a pairwise discrimination task among nine languages demonstrated that this approach considerably outperforms other previous methods based on the sole use of acoustic features.

Full Paper

Bibliographic reference.  Reyes-Herrera, Ana Lilia / Villaseñor-Pineda, Luis / Montes-y-Gómez, Manuel (2006): "Automatic language identification using wavelets", In INTERSPEECH-2006, paper 1998-Mon2CaP.1.