This paper studies the properties of the Histograms of Acoustic Co-occurrences (HAC) approach to acoustic modeling. While HACvectors have been predominantly used with matrix decomposition algorithms, we show that the additivity and sparseness constraints inherent in HAC lead to a representational space in which utterances are linearly separable with respect to the words they contain. The method implies that it is possible to detect and locate words in test utterances by using a classifier that is trained without any information about word order or word location during training. We demonstrate this by showing that an ensemble of linear classifiers can reach excellent detection scores on the TIDIGITS dataset. We further explore the usefulness of the linear separability in the HAC space by demonstrating the use of a sliding window decoder for continuous speech recognition.
Bibliographic reference. Versteegh, Maarten / Bosch, Louis ten (2013): "Detecting words in speech using linear separability in a bag-of-events vector space", In INTERSPEECH-2013, 680-684.