INTERSPEECH 2006 - ICSLP
Recognition of interest of a speaker within a human dialog bears great potential in many commercial applications. Within this work we therefore introduce an approach that analyses acoustic and linguistic cues of a spoken utterance. A systematic generation of more than 5k hi-level features basing on prosodic and spectral feature contours by means of descriptive statistical analysis and subsequent feature space optimization is used to find relevant acoustic attributes. For linguistic information integration a bag-of-words representation is used relying on a speech recognizer’s output. One main aspect is the database of more than 2k spontaneous sub-speaker turns recorded and annotated for this analysis. Several influence factors as microphone distance and ASR versus annotation of spoken content are discussed. Overall remarkable performance of a running prototype can be reported discriminating between three levels of interest.
Bibliographic reference. Schuller, Björn / Köhler, Niels / Müller, Ronald / Rigoll, Gerhard (2006): "Recognition of interest in human conversational speech", In INTERSPEECH-2006, paper 1621-Tue1A3O.1.