Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Novel Time Domain Multi-Class SVMs for Landmark Detection

Rahul Chitturi (1), Mark Hasegawa-Johnson (2)

(1) University of Texas at Dallas, USA; (2) University of Illinois at Urbana Champaign, USA

The training of precise speech recognition models depends on accurate segmentation of the phonemes in a training corpus. Segmentation is typically performed using HMMs, but recent speech recognition work suggests that the transient acoustic features characteristic of manner-class phoneme boundaries (landmarks) may be more precisely localized using acoustic classifiers specifically designed for the task of landmark detection. This paper makes an empirical exploration of new features which suit Landmark Detection and the application of Multi-class SVMs that are capable of improving the time alignment of phoneme boundaries proposed by Binary SVMs and HMM-based speech recognizer. On a standard benchmark data set (A database of Telugu Official Indian Language, spoken by 75 million people), we achieve a new state-of-the-art performance, reducing RMS phone boundary alignment error from 32ms to 22ms.

Full Paper

Bibliographic reference.  Chitturi, Rahul / Hasegawa-Johnson, Mark (2006): "Novel time domain multi-class SVMs for landmark detection", In INTERSPEECH-2006, paper 1904-Thu1CaP.14.