Multi-Task Learning for Voice Related Recognition Tasks

Ana Montalvo, Jose R. Calvo, Jean-Fran├žois Bonastre


Speech is a complex signal conveying numerous information about the message but also various characteristics of the speaker: its sex, age, accent, language. Understanding the use of these features by machine learning (ML) systems has two main advantages. First, it could help prevent bias and discrimination in ML speech applications. Second, joint modeling of this information using multitasking learning approaches (MTL) has great potential for improvement. We explore in this paper the use of MTL in non-linguistic tasks. We compare single- and multi-task models applied to three tasks: (spanish) nativeness, speaker and sex. The effect of training data set size in the performance of both single- and multi-task models is investigated as well as the specific contribution of nativeness and sex information to speaker recognition. Experimental results show that multi-task (MTL) models outperform single task models. We have also found that MTL is beneficial for small training data sets and for low-level acoustic features rather than for pretrained features such as bottleneck ones. Our results indicate also that more attention should be addressed to the information used by ML approaches in order to prevent biases or discrimination.


 DOI: 10.21437/Interspeech.2020-1857

Cite as: Montalvo, A., Calvo, J.R., Bonastre, J. (2020) Multi-Task Learning for Voice Related Recognition Tasks. Proc. Interspeech 2020, 2997-3001, DOI: 10.21437/Interspeech.2020-1857.


@inproceedings{Montalvo2020,
  author={Ana Montalvo and Jose R. Calvo and Jean-Fran├žois Bonastre},
  title={{Multi-Task Learning for Voice Related Recognition Tasks}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2997--3001},
  doi={10.21437/Interspeech.2020-1857},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1857}
}