Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM

Takuya Kishida, Shin Tsukamoto, Toru Nakashika


In this paper, we propose a multiple-domain adaptive restricted Boltzmann machine (MDARBM) for simultaneous conversion of speaker identity and emotion. This study is motivated by the assumption that representing multiple domains (e.g., speaker identity, emotion, accent) of speech explicitly in a single model is beneficial to reduce the effects from other domains when the model learns one domain’s characteristics. The MDARBM decomposes the visible-hidden connections of an RBM into domain-specific factors and a domain-independent factor to make it adaptable to multiple domains of speech. By switching the domain-specific factors from the source speaker and emotion to the target ones, the model can perform a simultaneous conversion. Experimental results showed that the target domain conversion task was enhanced by the other in the simultaneous conversion framework. In a two-domain conversion task, the MDARBM outperformed a combination of ARBMs independently trained with speaker-identity and emotion units.


 DOI: 10.21437/Interspeech.2020-2262

Cite as: Kishida, T., Tsukamoto, S., Nakashika, T. (2020) Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM. Proc. Interspeech 2020, 3431-3435, DOI: 10.21437/Interspeech.2020-2262.


@inproceedings{Kishida2020,
  author={Takuya Kishida and Shin Tsukamoto and Toru Nakashika},
  title={{Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3431--3435},
  doi={10.21437/Interspeech.2020-2262},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2262}
}