Odyssey 2012 - The Speaker and Language Recognition Workshop

June 25-28, 2012

Cisco's Speaker Segmentation and Recognition System

Sashin Kajarekar, Aparna Khare, Matthias Paulik, Neha Agrawal, Panchi Panchapagesan, Ananth Sankar, Satish Gannu

Cisco Systems, Inc, San Jose, CA, USA

This paper presents Cisco's speaker segmentation and recognition (SSR) system, which is a part of a commercial product. Cisco SSR uses speaker segmentation and speaker recognition algorithms with a crowd sourcing approach to create speaker metadata. The speaker metadata makes the enterprise videos more accessible and more navigable by itself, and by its combination with other forms of metadata such as keywords. This paper illustrates various functional blocks of SSR and a typical user interface. The paper describes the specific implementations of speaker segmentation and recognition algorithms. The paper also describes the evaluation data and protocols plus results for both speaker segmentation and speaker recognition tasks. Speaker segmentation results show that Cisco SSR performs comparable to the state-of-the-art on RT-03F data. Speaker recognition results show that a small set of user provided labels can be effectively transferred to a continuously expanding set of videos.

Full Paper

Bibliographic reference.  Kajarekar, Sashin / Khare, Aparna / Paulik, Matthias / Agrawal, Neha / Panchapagesan, Panchi / Sankar, Ananth / Gannu, Satish (2012): "Cisco's speaker segmentation and recognition system", In Odyssey-2012, 151-156.