4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
A large Putonghua corpus is introduced, which is primarily designed to support research in Chinese speech recognition, analysis and recognition system evaluation. This corpus consists of four major sub-corpora corresponding to isolated syllables, multi-syllable words, sentences, and telephone speech. With an elaborate design, the corpus encompasses all the phones and mono-syllables, as well as the co-articulation effects in the Putonghua; besides, keeps as little redundancy as possible. This parsimonious corpus makes it possible to acquire acoustic-phonetic knowledge for isolated words recognition and continuous Chinese recognition, to provide speech data for training telephone speech recognizer, also to provide a common test base for the performance assessment of recognizer.
Bibliographic reference. Wang, Ren-Hua / Xia, Deyu / Ni, Jinfu / Liu, Bicheng (1996): "USTC95---a putonghua corpus", In ICSLP-1996, 1894-1897.