Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors

Aravind Illa, Prasanta Kumar Ghosh


Speech production involves the movement of various articulators, including tongue, jaw, and lips. Estimating the movement of the articulators from the acoustics of speech is known as acoustic-to-articulatory inversion (AAI). Recently, it has been shown that instead of training AAI in a speaker specific manner, pooling the acoustic-articulatory data from multiple speakers is beneficial. Further, additional conditioning with speaker specific information by one-hot encoding at the input of AAI along with acoustic features benefits the AAI performance in a closed-set speaker train and test condition. In this work, we carry out an experimental study on the benefit of using x-vectors for providing speaker specific information to condition AAI. Experiments with 30 speakers have shown that the AAI performance benefits from the use of x-vectors in a closed set seen speaker condition. Further, x-vectors also generalizes well for unseen speaker evaluation.


 DOI: 10.21437/Interspeech.2020-1222

Cite as: Illa, A., Ghosh, P.K. (2020) Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors. Proc. Interspeech 2020, 1376-1380, DOI: 10.21437/Interspeech.2020-1222.


@inproceedings{Illa2020,
  author={Aravind Illa and Prasanta Kumar Ghosh},
  title={{Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1376--1380},
  doi={10.21437/Interspeech.2020-1222},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1222}
}