Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Chinese Input Method Based on Reduced Mandarin Phonetic Alphabet

Chun-Han Tseng, Chia-Ping Chen

National Sun Yat-Sen University, Taiwan

In this paper we study the problem of simplifying Chinese input method and making it suitable for use with mobile devices. To see the feasibility of aggressively reducing the number of keystrokes per Chinese character, we compare three input modes: character-based, syllable-based and first-symbol-based. Specifically, we use these linguistic units as token types and compare the perplexities. With the language model trained by data based on the ASBC corpus, the perplexity of the data set we collect from on-line chat and instant messages is 102.6 for character-based model, 67.7 for syllable-based model and 16.3 for first-symbol-based model. Arguing from the relation between the perplexity and the number of "typical" sentences of a language model, our conclusion is that on average there are 6 to 7 characters per first-symbol in natural Chinese language.

Full Paper

Bibliographic reference.  Tseng, Chun-Han / Chen, Chia-Ping (2006): "Chinese input method based on reduced Mandarin phonetic alphabet", In INTERSPEECH-2006, paper 1944-Mon3FoP.11.