This paper focuses on the automatic detection of a person's blood level alcohol based on automatic speech processing approaches. We compare 5 different feature types with different ways of modeling. Experiments are based on the ALC corpus of IS2011 Speaker State Challenge. The classification task is restricted to the detection of a blood alcohol level above 0.5‰. Three feature sets are based on spectral observations: MFCCs, PLPs, TRAPS. These are modeled by GMMs. Classification is either done by a Gaussian classifier or by SVMs. In the later case classification is based on GMM-based supervectors, i.e. concatenation of GMM mean vectors. A prosodic system extracts a 292-dimensional feature vector based on a voiced-unvoiced decision. A transcription-based system makes use of text transcriptions related to phoneme durations and textual structure. We compare the stand-alone performances of these systems and combine them on score level by logistic regression. The best stand-alone performance is the transcriptionbased system which outperforms the baseline by 4.8% on the development set. A Combination on score level gave a huge boost when the spectral-based systems were added (73.6%). This is a relative improvement of 12.7% to the baseline. On the test-set we achieved an UA of 68.6% which is a significant improvement of 4.1% to the baseline system.
Bibliographic reference. Bocklet, Tobias / Riedhammer, Korbinian / Nöth, Elmar (2011): "Drink and speak: on the automatic classification of alcohol intoxication by acoustic, prosodic and text-based features", In INTERSPEECH-2011, 3213-3216.