EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Two-Stage Probabilistic Approach to Text Segmentation

Yi-Chia Chen, Yi-Chung Lin

Industrial Technology Research Institute, Taiwan

For telephone-based spoken dialogue systems, the responses to users should be specific and short. Therefore, it is highly demanded to segment a topical text into specific event segments which can be use to answer users' queries. However, the lexical cohesion approach, which has been widely used to segment text into topics, is not suitable for segmenting text into smaller units, like events. In this paper, we present a two-stage approach to partition text into event segments. In the first stage, a trigram chunk tagger is used to label the segmentation tags. In the second stage, the unreliable segmentation tags are detected and then verified by a probabilistic verification model. Compared with the chunk tagger, the verification model can explore more contextual information and is less sensitive to the sparseness of training data. Experimental results show that the proposed two-stage approach significantly outperforms the chunk tagger approach. The improvements on precision and recall rates are 27% to 83% in different testing tasks.

Full Paper

Bibliographic reference.  Chen, Yi-Chia / Lin, Yi-Chung (2001): "Two-stage probabilistic approach to text segmentation", In EUROSPEECH-2001, 1081-1084.