13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Arabic Dialect Identification - "Is the Secret in the Silence?" and Other Observations

Hynek Bořil, Abhijeet Sangwan, John H. L. Hansen

Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering, University of Texas at Dallas, Richardson, Texas, USA

Conversational telephone speech (CTS) collections of Arabic dialects distributed trough the Linguistic Data Consortium (LDC) provide an invaluable resource for the development of robust speech systems including speaker and speech recognition, translation, spoken dialogue modeling, and information summarization. They are frequently relied on also in language (LID) and dialect identification (DID) evaluations. The first part of this study attempts to identify the source of the relatively high DID performance on LDCfs Arabic CTS corpora seen in recent literature. It is found that recordings of each dialect exhibit unique channel and noise characteristics and that silence regions are sufficient for performing reasonably accurate DID. The second part focuses on phonotactic dialect modeling that utilizes phone recognizers and support vector machines (PRSVM). New N-gram normalization of PRSVM input supervectors is introduced and shown to outperform the standard approach used in current LID and DID systems.

Index Terms: Arabic dialect identification, channel characteristics, LDC corpora, PRSVM

Full Paper

Bibliographic reference.  Bořil, Hynek / Sangwan, Abhijeet / Hansen, John H. L. (2012): "Arabic dialect identification - "is the secret in the silence?" and other observations", In INTERSPEECH-2012, 30-33.