Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems Articles uri icon

publication date

  • December 2013

start page

  • 37

end page

  • 53


  • 6


  • 2

International Standard Serial Number (ISSN)

  • 2255-2863


  • In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.


  • Computer Science


  • software agents; multimodal fusion; visual sensor networks; surveillance applications; spoken interaction; conversational agents; user modeling; dialog management