INTERSPEECH 2006  ICSLP

We propose a vector space approach to characterizing environments for robust speech recognition. We represent a given environment by a supervector formed by concatenating all the mean vectors of the Gaussian mixture components of the state observation densities of all hidden Markov models trained in the particular environment. New environment supervectors can now be obtained either by an interpolation method with a collection of supervectors trained from many real or simulated environments or by a transformation performed on an anchor supervector for a specific environment, such as a clean condition. At a 5dB signaltonoise (SNR) level, both interpolationand transformationbased approaches achieve a significant error rate reduction of close to 47% from a baseline system with cepstral mean subtraction (CMS) with only two adaptation utterances. When incorporating Nbest information to perform unsupervised adaptation at 5dB SNR with the same two utterances, we achieve a relative error reduction of about 40%, close to that achieved in the supervised mode.
Bibliographic reference. Tsao, Yu / Lee, ChinHui (2006): "A vector space approach to environment modeling for robust speech recognition", In INTERSPEECH2006, paper 1617Tue1A2O.5.