13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Confidence Measure for Speech Indexing Based on Latent Dirichlet Allocation

Grégory Senay, Georges Linarès

Laboratoire Informatique d'Avignon, University of Avignon, Avignon, France

This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%.

Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation

Full Paper

Bibliographic reference.  Senay, Grégory / Linarès, Georges (2012): "Confidence measure for speech indexing based on latent dirichlet allocation", In INTERSPEECH-2012, 2302-2305.