Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input

Kouichi Katsurada, Korin Richmond


We describe a speaker-independent mel-cepstrum estimation system which accepts electromagnetic articulography (EMA) data as input. The system collects speaker information with d-vectors generated from the EMA data. We have also investigated the effect of speaker independence in the input vectors given to the mel-cepstrum estimator. This is accomplished by introducing a two-stage network, where the first stage is trained to output EMA sequences that are averaged across all speakers on a per-triphone basis (and so are speaker-independent) and the second receives these as input for mel-cepstrum estimation. Experimental results show that using the d-vectors can improve the performance of mel-cepstrum estimation by 0.19 dB with regard to mel-cepstrum distortion in the closed-speaker test set. Additionally, giving triphone-averaged EMA data to a mel-cepstrum estimator is shown to improve the performance by a further 0.16 dB, which indicates that the speaker-independent input has a positive effect on mel-cepstrum estimation.


 DOI: 10.21437/Interspeech.2020-1630

Cite as: Katsurada, K., Richmond, K. (2020) Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input. Proc. Interspeech 2020, 3176-3180, DOI: 10.21437/Interspeech.2020-1630.


@inproceedings{Katsurada2020,
  author={Kouichi Katsurada and Korin Richmond},
  title={{Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3176--3180},
  doi={10.21437/Interspeech.2020-1630},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1630}
}