In this paper we investigate multi-speaker, multi-lingual speech synthesis for 4 Indic languages (Hindi, Marathi, Gujarathi, Bengali) as well as English in a fully convolutional attention based model. We show how factored embeddings can allow cross lingual transfer and investigate methods to adapt the model in a low resource scenario for the case of Marathi and Gujarati. We also show results on how effectively the model scales to a new language and how much data is required to train the system on a new language.
DOI: 10.21437/Interspeech.2018-1869
Cite as: Baljekar, P., Rallabandi, S., Black, A.W. (2018) An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages. Proc. Interspeech 2018, 2474-2478, DOI: 10.21437/Interspeech.2018-1869.
@inproceedings{Baljekar2018, author={Pallavi Baljekar and SaiKrishna Rallabandi and Alan W Black}, title={An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages}, year=2018, booktitle={Proc. Interspeech 2018}, pages={2474--2478}, doi={10.21437/Interspeech.2018-1869}, url={http://dx.doi.org/10.21437/Interspeech.2018-1869} }