5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Speech: A Privileged Modality

Luc E. Julia, Adam J. Cheyer

STAR Laboratory SRI International, Menlo Park, California, USA Artificial Intelligence Center SRI International, Menlo Park, California, USA

Ever since the publication of Bolt's ground-breaking "Put-That There" paper [1], providing multiple modalities as a means of easing the interaction between humans and computers has been a desirable attribute of user interface design. In Bolt's early approach, the style of modality combination required the user to conform to a rigid order when entering spoken and gestural commands. In the early 1990s, the idea of synergistic multimodal combination began to emerge [4], although actual implemented systems (generally using keyboard and mouse) remained far from being synergistic. Next-generation approaches involved time-stamped events to reason about the fusion of multimodal input arriving in a given time window, but these systems were hindered by time-consuming matching algorithms. To overcome this limitation, we proposed [6] a truly synergistic application and a distributed architecture for flexible interaction that reduces the need for explicit time stamping. Our slot-based approach is command directed, making it suitable for applications using speech as a primary modality. In this article, we use our interaction model to demonstrate that during multimodal fusion, speech should be a privileged modality, driving the interpretation of a query, and that in certain cases, speech has even more power to override and modify the combination of other modalities than previously believed.

Full Paper

Bibliographic reference.  Julia, Luc E. / Cheyer, Adam J. (1997): "Speech: a privileged modality", In EUROSPEECH-1997, 1843-1846.