Electronic International Standard Serial Number (EISSN)
1433-3058
abstract
As autonomous vehicles get closer to our daily lives, the need for architectures that function as redundant pipelines is becoming increasingly critical. To address this issue without compromising the budget, researchers aim to avoid duplicating high-cost sensors such as LiDARs. In this work, we propose using monocular cameras, which are already essential for some modules of the autonomous platform, for 3D scene understanding. While many methods for depth estimation using single images have been proposed in the literature, they usually rely on complex neural network ensembles that extract dense feature maps, resulting in a high computational cost. Instead, we propose a novel and inherently efficient method for obtaining depth images that replace tangled neural architectures with attention mechanisms applied to basic encoder decoder models. We evaluate our method on the KITTI public dataset and in real-world experiments on our automated vehicle. The obtained results prove the viability of our approach, which can compete with intricate state-of-the-art methods while outperforming most alternatives based on attention mechanisms.
Classification
subjects
Robotics and Industrial Informatics
keywords
depth estimation; deep learning; attention layers; autonomous driving