INTERSPEECH 2006 - ICSLP
The goal of event-based (EB) systems is the detection of the occurrence of important elements in the speech signal for different sound classes. In a speech recognition system, events can be combined to detect phones, words or sentences, or to identify landmarks to which a classifier or a decoder could be synchronized. The time boundaries of the events are then as important as the events themselves. Accordingly, the assessment of EB systems must take into account not only the correct identified sequence of events but also their correct time localization. Usually, only the token sequence or its boundaries are taken for evaluation. In this paper we propose an extension to standard recognition evaluation procedure, which combines recognition and segmentation performance. In our proposal, a modified Levensthein algorithm is used in the alignment between labeled and recognized events, where the degree of overlapping between them is taken in the local distance definition. We evaluate our approach on a rule based event detector, using the TIMIT corpus and compare the results of the new evaluation procedure with standard metrics. The results show that accuracy drops if alignment is made as a function of the overlapping between labels; nevertheless the agreement with the labeled boundaries is significantly improved.
Bibliographic reference. Lopes, Carla / Perdigão, Fernando (2006): "Improved performance evaluation of speech event detectors", In INTERSPEECH-2006, paper 1615-Thu1A1O.4.