Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Phonetic Landmark Detection for Automatic Language Identification

David Harwath, Mark Hasegawa-Johnson

University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana, IL, USA

This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.

Index Terms— Support Vector Machines, Gaussian Mixture Models, Distinctive Features, Language Identification

Full Paper

Bibliographic reference.  Harwath, David / Hasegawa-Johnson, Mark (2010): "Phonetic landmark detection for automatic language identification", In SP-2010, paper 231.