4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

USTC95---A Putonghua Corpus

Ren-Hua Wang, Deyu Xia, Jinfu Ni, Bicheng Liu

University of Science & Technology of China, Hefei, China

A large Putonghua corpus is introduced, which is primarily designed to support research in Chinese speech recognition, analysis and recognition system evaluation. This corpus consists of four major sub-corpora corresponding to isolated syllables, multi-syllable words, sentences, and telephone speech. With an elaborate design, the corpus encompasses all the phones and mono-syllables, as well as the co-articulation effects in the Putonghua; besides, keeps as little redundancy as possible. This parsimonious corpus makes it possible to acquire acoustic-phonetic knowledge for isolated words recognition and continuous Chinese recognition, to provide speech data for training telephone speech recognizer, also to provide a common test base for the performance assessment of recognizer.

Full Paper

Bibliographic reference.  Wang, Ren-Hua / Xia, Deyu / Ni, Jinfu / Liu, Bicheng (1996): "USTC95---a putonghua corpus", In ICSLP-1996, 1894-1897.