INTERSPEECH 2006 - ICSLP
In this paper we study the problem of simplifying Chinese input method and making it suitable for use with mobile devices. To see the feasibility of aggressively reducing the number of keystrokes per Chinese character, we compare three input modes: character-based, syllable-based and first-symbol-based. Specifically, we use these linguistic units as token types and compare the perplexities. With the language model trained by data based on the ASBC corpus, the perplexity of the data set we collect from on-line chat and instant messages is 102.6 for character-based model, 67.7 for syllable-based model and 16.3 for first-symbol-based model. Arguing from the relation between the perplexity and the number of "typical" sentences of a language model, our conclusion is that on average there are 6 to 7 characters per first-symbol in natural Chinese language.
Bibliographic reference. Tseng, Chun-Han / Chen, Chia-Ping (2006): "Chinese input method based on reduced Mandarin phonetic alphabet", In INTERSPEECH-2006, paper 1944-Mon3FoP.11.