Nearest neighbor-based techniques provide an approach to acoustic modeling that avoids the often lengthy and heuristic process of training traditional Gaussian mixture-based models. Here we study the problem of choosing the distance metric for a k-nearest neighbor (k-NN) phonetic frame classifier. We compare the standard Euclidean distance to two learned Mahalanobis distances, based on large-margin nearest neighbors (LMNN) and locality preserving projections (LPP).We use locality sensitive hashing for approximate nearest neighbor search to reduce the test time of k-NN classification. We compare the error rates of these approaches, as well as of baseline Gaussian mixture-based and multilayer perceptron classifiers, on the task of phonetic frame classification of speech from the TIMIT database. The k-NN classifiers outperform Gaussian mixture models, but not multilayer perceptrons. We find that the best k-NN classification performance is obtained using LPP, while LMNN is close behind.
Bibliographic reference. Labiak, John / Livescu, Karen (2011): "Nearest neighbors with learned distances for phonetic frame classification", In INTERSPEECH-2011, 2337-2340.