In this paper we present a large vocabulary, word-based, automatic Mandarin dictation system. In this system we use isolated words as input to alleviate the unnaturalness of syllable input used by many existing systems and to reduce the acoustic confusions among 414 Mandarin syllables in many existing isolated syllable input system. There are two stages of processing in the word-based dictation system. The first, or the acoustic processing, stage includes two modules: a large vocabulary word recognizer and a tone recognizer. The word recognizer generates N-best word candidates based on the tree-trellis fast search. Then the tone recognizer reduces the homonym number of those word candidates. In the second, or the linguistic processing, stage a statistical language model is applied to a word lattice generated in the first stage. The most likely word sequence and the corresponding Chinese character string are decoded via a Viterbi search. The system has been evaluated on a speaker-trained database, a word accuracy of 85.5% and character accuracy of 88.3% were obtained. A real-time demo system has also been implemented on an HP-9000/735 workstation.
Bibliographic reference. Chen, Jung-Kuei / Lee, Lin-Shan / Soong, Frank K. (1995): "Large vocabulary, word-based Mandarin dictation system", In EUROSPEECH-1995, 285-288.