5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Using Untranscribed Training Data to Improve Performance

George Zavaliagkos, Man-Hung Siu, Thomas Colthurst, Jayadev Billa

BBN Technologies, USA

This paper explores techniques for utilizing untranscribed training data pools to increase the available training data for automatic speech recognition systems. It has been well established that current speech recognition technology, especially in Large Vocabulary Conversational Speech Recognition (LVCSR), is largely language independent, and that the dominant factor with regards to performance on a certain language is the amount of available training data. The paper addresses this need for increased training data by presenting ways to use untranscribed acoustic data to increase the training data size and thus improve speech recognition.

Full Paper

Bibliographic reference.  Zavaliagkos, George / Siu, Man-Hung / Colthurst, Thomas / Billa, Jayadev (1998): "Using untranscribed training data to improve performance", In ICSLP-1998, paper 1007.