INTERSPEECH 2006 - ICSLP
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel combination or at the hypothesis level using confusion network techniques to improve the accuracy of a far field lecture transcription system. Using the techniques described here, we ran a series of experiments on the test set used by the US National Institute of Standards and Technologies for the RT-05S evaluation. For the multiple distant microphones (MDM) task of RT-05S, our system achieved a word error rate of 38.5% which represents an improvement of over 13% absolute compared to the best reported results in the RT-05S evaluation.
Bibliographic reference. Wölfel, Matthias / Fügen, Christian / Ikbal, Shajith / McDonough, John W. (2006): "Multi-source far-distance microphone selection and combination for automatic transcription of lectures", In INTERSPEECH-2006, paper 1253-Mon2BuP.5.