This paper presents a supervised approach for extractive summarization of spoken document considering utterance clusters in the documents as hidden variables. Utterances in important clusters may be jointly included in the summary, while those in less important clusters may be excluded as a whole. The summaries are therefore selected based on not only the conventional principle of including the most important utterances and minimizing the redundancy but also the hidden cluster structure in the document. The cluster structure of the documents is not known but can be inferred from the documents, and the summaries can be jointly obtained by the structured SVM learned from the training examples. Encouraging results were obtained on a lecture corpus in the preliminary experiments.
Bibliographic reference. Shiang, Sz-Rung / Lee, Hung-yi / Lee, Lin-shan (2013): "Supervised spoken document summarization based on structured support vector machine with utterance clusters as hidden variables", In INTERSPEECH-2013, 2728-2732.