5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Automatic Recognition of Continuous Cantonese Speech With Very Large Vocabulary

Alfred Ying Pang Ng (1), L. W. Chan (1), P. C. Ching (2)

(1) Department of Computer Science and Engineering (2) Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong

This paper presents the first published results for automatic recognition of continuous Cantonese speech with very large vocabulary. The size of the vocabulary covered by this system is about the same asthat encountered in the Hong Kong local Chinese newspaper, Wen Hui Bao (). The system covers 6335 Chinese characters () and a large number of Chinese words () can be formed by combining these Chinese characters. The input to the system is the end pointed speech waveform of a sentence or phrase, the output is the Big5 coded Chinese characters. In the development of the recognition system, we have devised new methods in 1) construction of a continuous Cantonese speech database, 2) lexical tone recognition in continuous Cantonese speech, and 3)integration of lexical tone and base syllable recognition results. The speaker dependent recognition rates for Chinese character, base syllable and lexical tone are 90.94%, 94.73% and 69.7% respectively.

Full Paper

Bibliographic reference.  Ng, Alfred Ying Pang / Chan, L. W. / Ching, P. C. (1997): "Automatic recognition of continuous Cantonese speech with very large vocabulary", In EUROSPEECH-1997, 1551-1554.