International Symposium on Tonal Aspects of Languages
With Emphasis on Tone Languages

Beijing, China
March 28-31, 2004

Occurrence Frequency and Transition Probability of the Chinese Four Tones

Shizuo Hiki (1), Kazuko Sunaoka (2), Liming Yang (3), Yasuyo Tokuhiro (4)

(1) School of Human Sciences; (2) School of Political Science and Economics; (3) Institute of Language Teaching; (4) Graduate School of Japanese Applied Linguistics, Waseda University, Japan

Occurrence frequency and transition probability of the Chinese four tones were analyzed statistically. The database used for the analysis was the Grammatical Knowledge-base of Contemporary Chinese (S-W. Yu, editor, Tsinghua University Press, China, 1998). One, two, three and four syllable words, about 55,000 words in total, were categorized into more than twenty kinds of parts of speech in this database.

The occurrence frequency of the four tones deviated from 25%, depending on number of syllables in a word or position of the syllables in the word. In nouns, Tone-2 was always around 25% but Tone-3 was 5% to 10% less. In the 1st syllables of the words with any number of syllables, Tone- 1 were more than 25%, but it became less in the back position of syllables. Tone-4 was 5% more even in the 1st syllables and became as large as 40% in the last syllables.

In the total 38,968 syllables of the nouns, occurrence frequencies of Tone-1, Tone-2, Tone-3 and Tone-4 were 24%, 23%, 18% and 33%, respectively. The deviations were much larger in the cases of some of the other parts of speech such as verb, adjective, adverb and onomatopoeia.

The entropy of the four tones, which is 2 bits when the ratios are equal, decreased by less than 0.05 bit when averaged over all positions of syllables. This indicates that the four tones are used fairly effectively in Chinese language. In the nouns, transition probability between Tone-2 and Tone-3 were less than average. Transitions beginning at Tone-1 and ending at Tone-4, or beginning at Tone-4 and ending at Tone-4 in the two, three and four syllable words were much more. Some other kind of parts of speech showed characteristic transition such as repetition of the same tone in successive syllables.

