EUROSPEECH 2001 Scandinavia
Though the Slovenian SpeechDat(II) database is the largest spoken language resources for Slovenian ever recorded, it belongs to the smaller speech data collections made available by the European LE2-4001 project (http://www.speechdat.org/). The aim of this paper is to analyze this new Slovenian resource and explore the possibilities of supplementing it with data recorded for other languages. The donor languages being considered are English, German, and Danish. For each of these languages four time as much speech data has been recorded (4000 speakers compared to the Slovenian 1000 speaker database). Our purely data-driven cross language tests show that serious problems are involved when porting data across languages. The problems are partly due to differences in the recording conditions (telephone line noise). Other problems can be explained by the different phonological structures of the analyzed languages.
Bibliographic reference. Iskra, Andrej / Petek, Bojan / Brøndsted, Tom (2001): "Recognition of slovenian speech: within and cross-language experiments on monophones using the speechdat(II)", In EUROSPEECH-2001, 2777-2780.