The Universal Background Model (UBM) is known as a speaker independent Gaussian Mixture Model (GMM) trained on a large speech corpus containing many speakers' recordings in various conditions. When noisy test data is involved, UBM trained on clean data is generally not optimal. Using noisy data for UBM training, however, creates a bias towards the specific development noise samples resulting in degraded speaker recognition performance in unseen noise types. In this study, we utilize an Acoustic Factor Analysis (AFA) based UBM that iteratively learns the dominant feature sub-spaces in each mixture component, resulting in a more robust model. We explore two variants of the model: one with an isotropic and the other with a diagonal residual noise. The Maximum-Likelihood (ML) training formulations of the models are provided. The latent variables of the model, termed acoustic factors, are used as features to train the second stage of factor analysis parameters, i.e., the traditional i-vector extractor. Experiments performed on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) indicate the effectiveness of the proposed strategy in both clean and noisy conditions.
Bibliographic reference. Hasan, Taufiq / Hansen, John H. L. (2013): "Acoustic factor analysis based universal background model for robust speaker verification in noise", In INTERSPEECH-2013, 3127-3131.