Visually-Grounded Language Model for Human-Robot Interaction

Zambuto, D; Dindo, H; Chella, A

Visually grounded human-robot interaction is recognized to be an essential ingredient of socially intelligent robots, and the integration of vision and language increasingly attracts attention of researchers in diverse fields. However, most systems lack the capability to adapt and expand themselves beyond the preprogrammed set of communicative behaviors. Their linguistic capabilities are still far from being satisfactory which make them unsuitable for real-world applications. In this paper we will present a system in which a robotic agent can learn a grounded language model by actively interacting with a human user. The model is grounded in the sense that meaning of the words is linked to a concrete sensorimotor experience of the agent, and linguistic rules are automatically extracted from the interaction data. The system has been tested on the NAO humanoid robot and it has been used to understand and generate appropriate natural language descriptions of real objects. The system is also capable of conducting a verbal interaction with a human partner in potentially ambiguous situations.

Zambuto, D., Dindo, H., Chella, A. (2010). Visually-Grounded Language Model for Human-Robot Interaction. INTERNATIONAL JOURNAL OF COMPUTATIONAL LINGUISTICS RESEARCH, 1:3, 105-115.