A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles

Bavirisetti, Durga Prasad; Martinsen, Herman Ryen; Kiss, Gabriel Hanssen; Lindseth, Frank

dc.contributor.author	Bavirisetti, Durga Prasad
dc.contributor.author	Martinsen, Herman Ryen
dc.contributor.author	Kiss, Gabriel Hanssen
dc.contributor.author	Lindseth, Frank
dc.date.accessioned	2024-03-05T11:13:46Z
dc.date.available	2024-03-05T11:13:46Z
dc.date.created	2024-01-02T12:51:05Z
dc.date.issued	2023
dc.identifier.citation	IEEE Open Journal of Intelligent Transportation Systems. 2023, 4, 909-928.	en_US
dc.identifier.issn	2687-7813
dc.identifier.uri	https://hdl.handle.net/11250/3121067
dc.description.abstract	In this paper, we investigate the use of Vision Transformers for processing and understanding visual data in an autonomous driving setting. Specifically, we explore the use of Vision Transformers for semantic segmentation and monocular depth estimation using only a single image as input. We present state-of-the-art Vision Transformers for these tasks and combine them into a multitask model. Through multiple experiments on four different street image datasets, we demonstrate that the multitask approach significantly reduces inference time while maintaining high accuracy for both tasks. Additionally, we show that changing the size of the Transformer-based backbone can be used as a trade-off between inference speed and accuracy. Furthermore, we investigate the use of synthetic data for pre-training and show that it effectively increases the accuracy of the model when real-world data is limited.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles	en_US
dc.title.alternative	A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	909-928	en_US
dc.source.volume	4	en_US
dc.source.journal	IEEE Open Journal of Intelligent Transportation Systems	en_US
dc.identifier.doi	10.1109/OJITS.2023.3335648
dc.identifier.cristin	2218908
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: A_Multi-Task_Vision_Transforme ...
Størrelse:: 14.73Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6592]
Publikasjoner fra CRIStin - NTNU [37533]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal