Duration modeling using DNN for Arabic speech synthesis

Imene Zangar, Zied Mnasri, Vincent Colotte, Denis Jouvet, Amal Houidhek

Duration modeling is a key task for every parametric speech synthesis system. Though such parametric systems have been adapted to many languages, no special attention was paid to explicitly handling Arabic speech characteristics. Actually, in Arabic phoneme duration has a distinctive role, because of consonant gemination and vowel quantity. Therefore, a precise modeling of sound durations is critical. In this paper we compare several modeling of phoneme durations (including duration modeling by HTS and MERLIN toolkits), and we propose a new approach which relies on using a set of models, each one being optimal for a given phoneme class (e.g., simple consonants, geminated consonants, short vowels, and long vowels). An objective evaluation carried out on a set of test sentences shows that the proposed approach leads to a more accurate modeling of the phoneme durations.

 DOI: 10.21437/SpeechProsody.2018-121

Cite as: Zangar, I., Mnasri, Z., Colotte, V., Jouvet, D., Houidhek, A. (2018) Duration modeling using DNN for Arabic speech synthesis. Proc. 9th International Conference on Speech Prosody 2018, 597-601, DOI: 10.21437/SpeechProsody.2018-121.

