Interpretable global-local dynamics for the prediction of eye fixations in autonomous driving scenarios

authors

published in

IEEE Access Journal

publication date

December 2020

start page

217068

end page

217085

volume

8

Digital Object Identifier (DOI)

https://doi.org/10.1109/access.2020.3041606

full text

http://hdl.handle.net/10016/33780

Electronic International Standard Serial Number (EISSN)

2169-3536

abstract

Human eye movements while driving reveal that visual attention largely depends on the context in which it occurs. Furthermore, an autonomous vehicle which performs this function would be more reliable if its outputs were understandable. Capsule Networks have been presented as a great opportunity to explore new horizons in the Computer Vision field, due to their capability to structure and relate latent information. In this article, we present a hierarchical approach for the prediction of eye fixations in autonomous driving scenarios. Context-driven visual attention can be modeled by considering different conditions which, in turn, are represented as combinations of several spatio-temporal features. With the aim of learning these conditions, we have built an encoder-decoder network which merges visual features' information using a global-local definition of capsules. Two types of capsules are distinguished: representational capsules for features and discriminative capsules for conditions. The latter and the use of eye fixations recorded with wearable eye tracking glasses allow the model to learn both to predict contextual conditions and to estimate visual attention, by means of a multi-task loss function. Experiments show how our approach is able to express either frame-level (global) or pixel-wise (local) relationships between features and contextual conditions, allowing for interpretability while maintaining or improving the performance of black-box related systems in the literature. Indeed, our proposal offers an improvement of 29% in terms of information gain with respect to the best performance reported in the literature.

Interpretable global-local dynamics for the prediction of eye fixations in autonomous driving scenarios Articles

Overview

authors

published in

publication date

start page

end page

volume

Digital Object Identifier (DOI)

full text

Electronic International Standard Serial Number (EISSN)

abstract

Classification

subjects

keywords