Speaker Diarization and Automatic Speech Recognition have been a topic of research for decades. Evaluating the developed systems has been required for almost as long. Following the NIST initiatives a number of metrics have become standard to handle these evaluations, namely the Diarization Error Rate and the Word Error Rate. The initial definitions of these metrics and, more importantly, their implementations, were designed for single-speaker speech. One of the aims of the OSEO Quaero and the ANR ETAPE projects was to investigate the capabilities of Diarization and ASR systems in the presence of overlapping speech. Evaluating said systems required extending the metrics definitions and adapting the algorithmic approaches required for their implementation. This paper presents these extensions and adaptations and the open tools that provide them.
Bibliographic reference. Galibert, Olivier (2013): "Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech", In INTERSPEECH-2013, 1131-1134.