2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)

Penang, Malaysia
September 11-12, 2014

Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context

Irina Illina (1), Dominique Fohr (1), Georges Linarès (2)

(1) Speech Group, LORIA-INRIA, Villers-les-Nancy, France
(2) LIA – University of Avignon, France

Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assumption is that time is an important feature for capturing name-to-context dependencies, that was confirmed by temporal mismatch experiments. We studied a method based on Mutual Information and proposed a new method based on cosine-similarity measure that dynamically augment the automatic speech recognition system vocabulary. Recognition results show a significant reduction of the word error rate using augmented vocabulary for broadcast news transcription.

Index Terms: speech recognition, out-of-vocabulary words, proper names, vocabulary augmentation

Full Paper

Bibliographic reference.  Illina, Irina / Fohr, Dominique / Linarès, Georges (2014): "Proper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context", In SLAM-2014, 29-33.