Show simple item record

dc.contributor.advisorPedersen, Marius
dc.contributor.advisorWaaseth, Kjartan Sebastian
dc.contributor.authorUllah, Ehsan
dc.date.accessioned2022-10-01T17:23:56Z
dc.date.available2022-10-01T17:23:56Z
dc.date.issued2022
dc.identifierno.ntnu:inspera:118516831:67637207
dc.identifier.urihttps://hdl.handle.net/11250/3023069
dc.description.abstract
dc.description.abstractHigh dynamic range (HDR) video reconstruction is a very challenging task, especially from sequence of frames with alternating exposures. A convenient approach to generating HDR video is to acquire a sequence of images with alternating exposures using conventional camera systems and reconstruct the missing content or details at each frame. Sadly, conventional methods are typically slow and incapable of dealing with complex examples. Current learning-based techniques usually align low dynamic range (LDR) input sequences using optical flow by estimating flows between neighboring frames. The aligned LDR images are then merged to produce final HDR output. However, due to noise in the under-exposed regions and missing content in the over-exposed regions, precise alignment and fusion is still a big challenge which results in an unappealing ghosting artifacts. In this research work, we propose a learning-based approach to address the issue of HDR video reconstruction with alternating exposures. Our approach have three main stages, the first stage perform alignment of neighbouring frames to the current frame by estimating the flows between them, the second stage is composed of multiattention modules and pyramid cascading deformable (PCD) alignment module to refine previously aligned features and extract only relevant information from the neighbouring frames in relation to the reference frame. The final stage perform the merging which takes the features extracted with the multi-attention guided and PCD alignment module as input and estimates the final HDR scene relying on a series of dilated selective kernel fusion residual dense blocks (DSKFRDBs) with the global residual learning strategy allowing the network to fill the over-exposed regions with rich details. The whole network is trained in an end-to-end fashion for estimating HDR video using publicly available HDR video datasets, with simulated limitations of conventional digital cameras. We employ L1 and a combined L1MS−SSIM loss function to minimize the error between the estimated and original HDR images. We demonstrate the performance of our method on a number of HDR test datasets achieving better alignment and hallucinating details in the over-exposed regions in most cases from the recent state of the art methods and having a smaller number of network parameters than the state of the art methods.
dc.languageeng
dc.publisherNTNU
dc.titleMulti-Attention SKFHDRNet For HDR Video Reconstruction
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record