Vis enkel innførsel

dc.contributor.advisorMohammed, Ahmed
dc.contributor.advisorWaszak, Maryna
dc.contributor.advisorLudvigsen, Martin
dc.contributor.authorAzad, Md Abulkalam
dc.date.accessioned2023-10-04T17:19:36Z
dc.date.available2023-10-04T17:19:36Z
dc.date.issued2023
dc.identifierno.ntnu:inspera:140295966:121373137
dc.identifier.urihttps://hdl.handle.net/11250/3094213
dc.description.abstract
dc.description.abstractToday ship hull inspection including the examination of the external coating, detection of defects, and other types of external degradation such as corrosion and marine growth is conducted underwater by means of Remotely Operated Vehicles (ROVs). The inspection process consists of a manual video analysis which is time-consuming and labour-intensive. To address this, we propose an automatic video analysis system using deep learning and computer vision to improve upon existing methods that only consider spatial information on individual frames in underwater ship hull video inspection. By exploring the benefits of adding temporal information and analyzing frame-based classifiers, we propose a multi-label video classification model that exploits the self-attention mechanism of transformers to capture spatiotemporal attention in consecutive video frames. Apart from utilizing off-the-shelf vision transformers for extracting spatial information, we have incorporated the transformer from the original language model to extract temporal information from the video. We have specifically highlighted the underlying distinct characteristics between these transformers and made empirical modifications to optimize their performance and achieve the best results. Furthermore, our investigation delved into the self-attention mechanism, which serves as a critical component of the transformer architecture. We introduced a different but light approach known as single query attention computation, which has proven instrumental in enhancing the robustness of the model. By utilizing attention scores as weights, we were able to improve the overall performance and strengthen the reliability of our approach. In a nutshell, we have showcased three distinct approaches to multi-label video classification, and the outcomes have demonstrated great potential, positioning this work as a benchmark for future research and development in underwater video inspection applications.
dc.languageeng
dc.publisherNTNU
dc.titleMulti-label Video Classification for Underwater Ship Inspection
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel