ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Generalizing continuous-space translation of paralinguistic information

Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura

In previous work, we proposed a model for speech-to-speech translation that is sensitive to paralinguistic information such as duration and power of spoken words. This model uses linear regression to map source acoustic features to target acoustic features directly and in continuous space. However, while the model is effective, it faces scalability issues as a single model must be trained for every word, which makes it difficult to generalize to words for which we do not have parallel speech. In this work we first demonstrate that simply training a linear regression model on all words is not sufficient to express paralinguistic translation. We next describe a neural network model that has sufficient expressive power to perform paralinguistic translation with a single model. We evaluate the proposed method on a digit translation task and show that we achieve similar results with a single neural network-based model as previous work did using word-dependent models.

doi: 10.21437/Interspeech.2013-602

Cite as: Kano, T., Takamichi, S., Sakti, S., Neubig, G., Toda, T., Nakamura, S. (2013) Generalizing continuous-space translation of paralinguistic information. Proc. Interspeech 2013, 2614-2618, doi: 10.21437/Interspeech.2013-602

  author={Takatomo Kano and Shinnosuke Takamichi and Sakriani Sakti and Graham Neubig and Tomoki Toda and Satoshi Nakamura},
  title={{Generalizing continuous-space translation of paralinguistic information}},
  booktitle={Proc. Interspeech 2013},