Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Rapid Speaker Adaptation Using Regression-Tree Based Spectral Peak Alignment

Shizhen Wang (1), Xiaodong Cui (2), Abeer Alwan (1)

(1) University of California at Los Angeles, USA; (2) IBM T.J. Watson Research Center, USA

In this paper, regression-tree based spectral peak alignment is proposed for rapid speaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on both medium vocabulary and connected digits recognition tasks. To further improve the performance, MLLR was integrated into this regression-tree based peak alignment. Experimental results show that the performance improvements can be achieved even with limited adaptation data.

Full Paper

Bibliographic reference.  Wang, Shizhen / Cui, Xiaodong / Alwan, Abeer (2006): "Rapid speaker adaptation using regression-tree based spectral peak alignment", In INTERSPEECH-2006, paper 1334-Wed1A2O.1.