4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Speaker-independent Dictation of Chinese Speech with 32K Vocabulary

Bo Xu, Bing Ma, Shuwu Zhang, Fei Qu, Taiyi Huang

Speech Research Group, National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

While early machines adopted isolated syllable as input units and needed boring enrollment, our research focus on the speaker-independent, word-based dictation. A deliberately designed 120-speaker database was built for training ; inter-syllable context ,tonal and endpoint dependent acoustic model are applied with promising MFCC feature; Two-pass acoustic matching accelerates the recognition making fully advantage of the monosyllabic structure of Chinese speech; A complete word bigram and trigram serve as language processing module. With all efforts, the system reaches 90% character accuracy performing in almost real-time on Pentium PC without DSP help.

Full Paper

Bibliographic reference.  Xu, Bo / Ma, Bing / Zhang, Shuwu / Qu, Fei / Huang, Taiyi (1996): "Speaker-independent dictation of Chinese speech with 32k vocabulary", In ICSLP-1996, 2320-2323.