Generalized Distillation Framework for Speaker Normalization

Neethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham


Generalized distillation framework has been shown to be effective in speech enhancement in the past. We extend this idea to speaker normalization without any explicit adaptation data in this paper. In the generalized distillation framework, we assume the presence of some “privileged” information to guide the training process in addition to the training data. In the proposed approach, the privileged information is obtained from a “teacher” model, trained on speaker-normalized FMLLR features. The “student” model is trained on un-normalized filterbank features and uses teacher’s supervision for cross-entropy training. The proposed distillation method does not need first pass decode information during testing and imposes no constraints on the duration of the test data for computing speaker-specific transforms unlike in FMLLR or i-vector. Experiments done on Switchboard and AMI corpus show that the generalized distillation framework shows improvement over un-normalized features with or without i-vectors.


 DOI: 10.21437/Interspeech.2017-874

Cite as: Joy, N.M., Kothinti, S.R., Umesh, S., Abraham, B. (2017) Generalized Distillation Framework for Speaker Normalization. Proc. Interspeech 2017, 739-743, DOI: 10.21437/Interspeech.2017-874.


@inproceedings{Joy2017,
  author={Neethu Mariam Joy and Sandeep Reddy Kothinti and S. Umesh and Basil Abraham},
  title={Generalized Distillation Framework for Speaker Normalization},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={739--743},
  doi={10.21437/Interspeech.2017-874},
  url={http://dx.doi.org/10.21437/Interspeech.2017-874}
}