12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems

F. Diehl, M. J. F. Gales, X. Liu, M. Tomalin, P. C. Woodland

University of Cambridge, UK

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that word-boundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaussian modelling in the Minimum Phone Error (MPE) training framework is introduced. This reduces the full covariance training to a diagonal covariance training problem, thereby solving related robustness problems. The full system results show that the combined use of these and other techniques within a multi-branch combination framework reduces the Word Error Rate (WER) of the complete system by up to 5.9% relative.

Full Paper

Bibliographic reference.  Diehl, F. / Gales, M. J. F. / Liu, X. / Tomalin, M. / Woodland, P. C. (2011): "Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems", In INTERSPEECH-2011, 777-780.