EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Cohorts Based Custom Models for Rapid Speaker and Dialect Adaptation

Jian Wu (1), Eric Chang (2)

(1) The University of Hong Kong, Hong Kong, China
(2) Microsoft Research China, China

It is well known that speaker dependent acoustic models can achieve an error rate that is up to a factor of two smaller compared to well trained speaker independent acoustic models. Thus, for improved accuracy, many modern dictation systems require the user to perform enrollment sessions to adapt the acoustic model of the system. In this paper, we present an approach that uses as few as three sentences from the test speaker to select N closest speakers (cohorts) from both the original training set and newly available training speakers to construct customized models. By using such an approach, our adaptation scheme can be updated online without re-configuring anything that has been calculated before. When applying this approach to address dialectal differences, the cohort based user specific models constructed with 3 user sentences can obtain a lower error rate even when compared to user-adapted models based on 170 user sentences.

Full Paper

Bibliographic reference.  Wu, Jian / Chang, Eric (2001): "Cohorts based custom models for rapid speaker and dialect adaptation", In EUROSPEECH-2001, 1261-1264.