Speech Prosody 2010
Chicago, IL, USA
This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.
Index Terms Support Vector Machines, Gaussian Mixture Models, Distinctive Features, Language Identification
Bibliographic reference. Harwath, David / Hasegawa-Johnson, Mark (2010): "Phonetic landmark detection for automatic language identification", In SP-2010, paper 231.