This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that word-boundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaussian modelling in the Minimum Phone Error (MPE) training framework is introduced. This reduces the full covariance training to a diagonal covariance training problem, thereby solving related robustness problems. The full system results show that the combined use of these and other techniques within a multi-branch combination framework reduces the Word Error Rate (WER) of the complete system by up to 5.9% relative.
Bibliographic reference. Diehl, F. / Gales, M. J. F. / Liu, X. / Tomalin, M. / Woodland, P. C. (2011): "Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems", In INTERSPEECH-2011, 777-780.