2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)

Penang, Malaysia
September 11-12, 2014

Preparation of MaDiTS Corpus for Malay Dialect Translation and Speech Synthesis System

Yen-Min Jasmina Khaw, Tien-Ping Tan

School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia

This paper presents our work in acquiring a Malay dialect translation and speech synthesis corpus. In this study, an architecture of speech corpus acquisition, which including Malay dialect translation and Malay dialect grapheme to phoneme (G2P), was proposed. The pronunciation dictionary for dialectal Malay was generated through G2P tool. As dialectal Malay is considered as scarce resource, dialectal translation rules were developed for translating standard Malay text into dialectal Malay. With this, Kelantanese Malay is chosen in this research as it is considered as one of the Malay dialect from Kelantan, which positioned in the northeast of Peninsular Malaysia. This dialect is very distinctive. Evaluation results showed that the selected sentences through proposed approach has a correlation coefficient of about 0.99, which mean that it is phonetically well balanced.

Index Terms: Malay dialect translation, Malay dialect grapheme to phoneme, speech synthesis corpus

