13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation

Ting-yao Hu (1), Yu Tsao (2), Lin-shan Lee (1)

(1) Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan
(2) Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan

We propose a discriminative fuzzy clustering maximum a posterior linear regression (DFCMAPLR) model adaptation approach to compensate the acoustic mismatch due to speaker variability. The DFCMAPLR approach adopts the MAP criterion and a discriminative objective function to estimate shared affine transform and fuzzy weight sets, respectively. Then, through a linear combination of the calculated fuzzy weights and shared affine transforms, more specific affine transforms are formed for model adaptation. By incorporating the MAP criterion and the discriminative information, DFCMAPLR can calculate shared affine transforms reliably and enhance the discriminative power of the adapted acoustic model. Based on the experimental results on the ASTTEL200 Mandarin corpus, we verified that DFCMAPLR outperforms not only the conventional maximum likelihood linear regression (MLLR) but also the fuzzy clustering MLLR(FCMLLR), which estimates the shared affine transform and fuzzy weight sets both based on the maximum likelihood criterion. Moreover, when compared to the baseline result, DFCMAPLR provides a clear improvement of 9.86% (24.04% to 21.67%) relative average phone error rate (PER) reduction.

Index Terms: speech recognition, speaker adaptation, FCMLLR

Full Paper

Bibliographic reference.  Hu, Ting-yao / Tsao, Yu / Lee, Lin-shan (2012): "Discriminative fuzzy clustering maximum a posterior linear regression for speaker adaptation", In INTERSPEECH-2012, 567-570.