An automatic creation of the language model for the recognition of the spontaneous pronounced sentences requires a creation of a large corpus of training sentences where word and phrase boundaries are labeled in the sentences automatically during sentence generation. The corpus of the training sentences contains about 10,000 domain dependent sentences where each sentence is unique and it has been generated using sentence templates (prototypes) and a stochastic context-free grammar. The syntactic structure of the Czech sentences used for a derivation of sentence prototypes has been described by special sentence formulas.
Bibliographic reference. Kleckova, Jana / Matousek, Vaclav / Netrvalova, Jana (1995): "AN AUTOMATIC CREATION OF THE LANGUAGE MODEL FOR THE SPONTANEOUS CZECH SPEECH RECOGNIZER", In EUROSPEECH-1995, 1185-1188.