5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Chinese Language Model Adaptation Based on Document Classification and Multiple Domain-Specific Language Models

Sung-Chien Lin (1), Chi-Lung Tsai (1), Lee-Feng Chien (2), Ker-Jiann Chen (2), Lin-Shan Lee (1,2)

(1) Dept. of Computer Science and Information Engineering, National Taiwan University
(2) Institute of information Science, Academia Sinica Taipei, Taiwan Republic of China

Adaptation of language models to the specific subject domains is definitely important for real speech recognition applications. In this paper, a Chinese language model adaptation approach is presented mainly based on document classification and multiple domain- specific language models. The proposed document classification method using the perplexity value and word bigram coverage value as primary measures are able to model word associations and syntactic behavior in classifying documents into the clusters and thus creates more effective domain-specific language models. The adaptation of language model in speech recognition can be therefore effectively achieved by the proper selection of the most appropriated domain-specific language model. Preliminary tests have been made in application to Mandarin speech recognition and shown its exciting performance of the proposed approach in creating real applications.

Full Paper

Bibliographic reference.  Lin, Sung-Chien / Tsai, Chi-Lung / Chien, Lee-Feng / Chen, Ker-Jiann / Lee, Lin-Shan (1997): "Chinese language model adaptation based on document classification and multiple domain-specific language models", In EUROSPEECH-1997, 1463-1466.