12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Nearest Neighbors with Learned Distances for Phonetic Frame Classification

John Labiak (1), Karen Livescu (2)

(1) University of Chicago, USA
(2) Toyota Technological Institute at Chicago, USA

Nearest neighbor-based techniques provide an approach to acoustic modeling that avoids the often lengthy and heuristic process of training traditional Gaussian mixture-based models. Here we study the problem of choosing the distance metric for a k-nearest neighbor (k-NN) phonetic frame classifier. We compare the standard Euclidean distance to two learned Mahalanobis distances, based on large-margin nearest neighbors (LMNN) and locality preserving projections (LPP).We use locality sensitive hashing for approximate nearest neighbor search to reduce the test time of k-NN classification. We compare the error rates of these approaches, as well as of baseline Gaussian mixture-based and multilayer perceptron classifiers, on the task of phonetic frame classification of speech from the TIMIT database. The k-NN classifiers outperform Gaussian mixture models, but not multilayer perceptrons. We find that the best k-NN classification performance is obtained using LPP, while LMNN is close behind.

Full Paper

Bibliographic reference.  Labiak, John / Livescu, Karen (2011): "Nearest neighbors with learned distances for phonetic frame classification", In INTERSPEECH-2011, 2337-2340.