The paper proposes to train synthetic speaker models using vocal tract length normalization (VTLN). Speaker adaptation based approaches require certain amount of data from the test speaker to either update or transform the model parameters of the trained model. If there is very little or no data available from the test speaker, we propose to create a synthetic speaker model that is acoustically close to the test speaker by scaling the training data with VTLN. For this purpose, we train multiple VTLN warped speaker independent (SI) models by scaling the training data with VTLN and choosing one of the models that is acoustically close to the test speaker for performing recognition. We show that the proposed approach is advantageous in mismatched speaker conditions, especially while recognizing children speakers using models trained on adult speech.
Bibliographic reference. Sanand, D. R. / Svendsen, T. (2013): "Synthetic speaker models using VTLN to improve the performance of children in mismatched speaker conditions for ASR", In INTERSPEECH-2013, 3361-3365.