First International Conference on Spoken Language Processing (ICSLP 90)
A design of spoken Chinese corpus is proposed which consists of five sub-corpora Cl to C5. The design principles are (1) Mono-syllables are important not only for the recognition of isolated syllables but also for the recognition of connected spoken Chinese because they simplify the isolation of phonetic information in the training phase, (2) A corpus including all inter-syllable triphones captures all the immediate left- and right-context of phones at syllabic boundaries. This is a logical and practical compromise between exhausting all possible syllabic transitions and keeping the corpus building effort at a manageable level. Cl consists of 433 mono-syllables and 4 consonant clusters in each of its four versions. Each toned syllable exists in at least one of the four versions and the 433 mono-syllables in each version include all the syllables in one of the four regular tones (yin, yang, shang and qu) plus all the neutral-toned ones. C2 is a collection of 16 digit strings each ranging from 4 to 7 digits in length. These strings exhaust all the inter-digits triphones. C3 has 30 geographic names each of two to five syllables. C4 consists of 859 short phrases each of six to nine syllables long forming the bulk of the inter-syllable triphone collection. C5 is a catch-all sub-corpus composed of all the inter-syllable triphones which do not appear in C2 to C4. These triphones are very seldomly used in the Chinese language today and can be ignored in recognizer training for practical purposes. Keywords:-Speech database, isolated speech, connected speech, Putonghua, inter-syllable triphones, coarticulatiori between syllables
Bibliographic reference. Chan, Chorkin / Wang, Ren-hua (1990): "The HKU-USTC speech corpus", In ICSLP-1990, 993-996.