A multimodal emotion detection system during human-robot interaction Articles uri icon

publication date

  • November 2013

start page

  • 15549

end page

  • 15581


  • 11


  • 13

International Standard Serial Number (ISSN)

  • 1424-3210

Electronic International Standard Serial Number (EISSN)

  • 1424-8220


  • In this paper, a multimodal user-emotion detection system for social robots is presented. This system is intended to be used during human-robot interaction, and it is integrated as part of the overall interaction system of the robot: the Robotics Dialog System (RDS). Two modes are used to detect emotions: the voice and face expression analysis. In order to analyze the voice of the user, a new component has been developed: Gender and Emotion Voice Analysis (GEVA), which is written using the Chuck language. For emotion detection in facial expressions, the system, Gender and Emotion Facial Analysis (GEFA), has been also developed. This last system integrates two third-party solutions: Sophisticated High-speed Object Recognition Engine (SHORE) and Computer Expression Recognition Toolbox (CERT). Once these new components (GEVA and GEFA) give their results, a decision rule is applied in order to combine the information given by both of them. The result of this rule, the detected emotion, is integrated into the dialog system through communicative acts. Hence, each communicative act gives, among other things, the detected emotion of the user to the RDS so it can adapt its strategy in order to get a greater satisfaction degree during the human-robot dialog. Each of the new components, GEVA and GEFA, can also be used individually. Moreover, they are integrated with the robotic control platform ROS (Robot Operating System). Several experiments with real users were performed to determine the accuracy of each component and to set the final decision rule. The results obtained from applying this decision rule in these experiments show a high success rate in automatic user emotion recognition, improving the results given by the two information channels (audio and visual) separately.


  • Robotics and Industrial Informatics


  • emotion recognition; affective computing; human-robot interaction; dialog systems; facs; facial expressions; face detection; recognition; speech; body