Is that what you want? Architectural Challenges of Engaging in Multi-Modal Natural Language Interactions with Humans

Matthias Scheutz


Task-based natural language interactions with robots naturally involve multi-modal aspects, from simple gestures accompanying utterances, to the resolution of complex referential expressions that require perceptual integration and reasoning. In this presentation, I will discuss several architectural challenges of integrated natural language understanding, perception and action that have to be addressed for robots to be able to handle natural multi-modal interactions with humans. These challenges include resolving references appropriately in open worlds using perceptual information and common sense reasoning, understanding intended meanings in indirect speech acts, and automatically applying normative constraints throughout the interactions. I will illustrate the various conceptual points with examples and robot demonstrations from our own attempts to tackle some of these challenges.


Cite as: Scheutz, M. (2018) Is that what you want? Architectural Challenges of Engaging in Multi-Modal Natural Language Interactions with Humans. Proc. FAIM/ISCA Workshop on Artificial Intelligence for Multimodal Human Robot Interaction.


@inproceedings{Scheutz2018,
  author={Matthias Scheutz},
  title={Is that what you want? Architectural Challenges of Engaging in Multi-Modal Natural Language Interactions with Humans},
  year=2018,
  booktitle={Proc. FAIM/ISCA Workshop on Artificial Intelligence for Multimodal Human Robot Interaction}
}