First European Conference on Speech Communication and Technology

Paris, France
September 27-29, 1989

An Information Theory Approach to Speaker Adaptation

Gerhard Rigoll

Fraunhofer-Institute (IAO), Stuttgart, West Germany

This paper describes a novel approach to speaker adaptation. The work was carried out by the author while he was a visiting scientist at the IBM Thomas Watson Research Center in Yorktown Heights/USA. The purpose of the research was to train the IBM speech recognition system with only five minutes of speech and to obtain at least a 95% recognition rate after adaptation for a 5000 word vocabulary recognition task. The adaptation algorithm is based on an Information Theory approach used for estimating the label stream of the new speaker by using a stochastic model describing the spectral differences between the new and a reference speaker. During an evaluation where twelve speakers were tested in ordinary 20 minutes speaker-dependent training mode the average recognition rate for a 5000 word vocabulary task was 96.4%. When the speakers were tested in 5 minutes adaptation mode the recognition rate dropped to 95.2%. A very important point is that the average decoding time increased by a factor of 1.35 while this factor is often 3-5 if other adaptation algorithms are used.

Full Paper

Bibliographic reference.  Rigoll, Gerhard (1989): "An information theory approach to speaker adaptation", In EUROSPEECH-1989, 1494-1497.