13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Temporal and Situational Context Modeling for Improved Dominance Recognition in Meetings

Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Institute for Human-Machine Communication, Technische Universität München, Germany

We present and evaluate a novel approach towards automatically detecting a speaker's level of dominance in a meeting scenario. Since previous studies reveal that audio appears to be the most important modality for dominance recognition, we focus on the analysis of the speech signals recorded in multi-party meetings. Unlike recently published techniques which concentrate on frame-level hidden Markov modeling, we propose a recognition framework operating on segmental data and investigate context modeling on three different levels to explore possible performance gains. First, we apply a set of statistical functionals to capture large-scale feature-level context within a speech segment. Second, we consider bidirectional Long Short-Term Memory recurrent neural networks for long-range temporal context modeling between segments. Finally, we evaluate the benefit of situational context incorporation by simultaneously modeling speech of all meeting participants. Overall, our approach leads to a remarkable increase of recognition accuracy when compared to hidden Markov modeling.

Full Paper

Bibliographic reference.  Wöllmer, Martin / Eyben, Florian / Schuller, Björn / Rigoll, Gerhard (2012): "Temporal and situational context modeling for improved dominance recognition in meetings", In INTERSPEECH-2012, 350-353.