EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

Preliminary Experiments on Language Identification Using Broadcast News Recordings

Laurent Benarousse, Edouard Geoffrois

DGA/CTA/GIP, France

This article presents experiments on language identification using Broadcast News recordings, for which large amounts of data are available. The system uses a Broadcast News partitioner developed by LIMSI to extract the speech segments from raw signals. These segments are then transcribed using a language-independent HMM acoustic model. Phonotactic models are trained for each language, and used to score the transcription of the test signals. Training was conducted on recordings from three monolingual radios (about 17h of signal per language) and tests were made on signals from other radios. We also investigated a rejection strategy to improve the identification results. Without any rejection, the error rates range from 13.8% (5s segments) to 4.3% (45 s segments). Rejecting 1/3 of the data improves these rates by 78% for 10s segments.

Full Paper

Bibliographic reference.  Benarousse, Laurent / Geoffrois, Edouard (2001): "Preliminary experiments on language identification using broadcast news recordings", In EUROSPEECH-2001, 799-802.