EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Automatic Analysis of Real Dialogues and Generating of Training Corpora

Jana Schwarz (1), Vaclav Matousek (2)

(1) Technical University of Dresden, Germany
(2) University of West Bohemia in Pilsen, Czech Republic

The development of computerized information retrieval dialogue systems communicating with the user in natural language requires the implementation of an effective training procedure with the aid of which the main modules of the dialogue system have to be partly automatically developed. The presented paper describes an attempt to create the generating sentence templates automatically, using a special program package implementing an especially developed method of a quantitative linguistic analysis of transcribed real dialogues. Firstly, the program package generates a set of formulas (templates) consisting of a special grammar and describing the syntactic structure of required sentences. Secondly, it generates a large corpus of unique training sentences using the sentence templates and a stochastic context-free grammar. The experimentally created corpus was used for the training of modules of a city information dialogue system.

Full Paper

Bibliographic reference.  Schwarz, Jana / Matousek, Vaclav (2001): "Automatic analysis of real dialogues and generating of training corpora", In EUROSPEECH-2001, 2201-2204.