INTERSPEECH 2006 - ICSLP
"Meeting" speech, for example from the RT-04S task, contains a mixture of different speaking styles that leads to word error rates higher than 25% even when close-talking microphones are being used. The problem is even more serious, as word error rates are particularly high when speakers use a clear speaking mode, for example because they want to stress an important point. Previous work showed that an approach that combines standard phone-based acoustic models with models detecting the presence or absence of "Articulatory Features" such as "Rounded" or "Voiced" can improve ASR performance particularly for these cases. This paper presents a discriminative approach to automatically computing from training or adaptation data the feature stream weights needed for the above approach, therefore presenting a framework for integrating articulatory features into existing automatic speech recognition systems. We find a 7% relative improvements on top of our best RT-04S system using discriminative adaptation.
Bibliographic reference. Metze, Florian (2006): "Articulatory features for "meeting" speech recognition", In INTERSPEECH-2006, paper 1891-Mon3WeS.5.