Abstract
Capturing scenes with a wide dynamic range presents inherent challenges due to sensor limitations, resulting in a loss of dynamic range in captured images. By fusing multiple images captured at varying exposures, a single high dynamic range (HDR) image can be generated, capturing the full range of scene luminance. However, this multi-exposure fusion process is challenging, particularly when images are misaligned due to real-world factors such as object or camera motion. Conventional methods often struggle to mitigate alignment and ghosting artifacts and handle occlusions. This study introduces an end-to-end HDR fusion pipeline to process RAW images captured at various exposures, effectively fuse their details, and generate a single HDR image, leveraging deep learning to address alignment, ghosting, and occlusion challenges. Unlike existing methods predominantly operating on non-linear images, our approach relies on RAW images due to their linear properties. The absence of available comprehensive datasets containing RAW images with multiple exposures and corresponding HDR ground truth necessitated the creation of specialized dataset creation pipelines. Subsequently, deep neural network models are developed and trained for HDR fusion on linear images. These models exhibit generalization capabilities across camera parameters, including Exposure Value (EV) and ISO settings. The models' proficiency in image alignment, ghosting suppression, and occlusion handling is noteworthy. In addition, we curate a real-world RAW image dataset for the standardized evaluation of different HDR fusion models. In summary, this research presents a novel approach to RAW-based HDR Fusion using deep learning, alleviating ghosting artifacts resulting from motion. The introduced dataset creation pipelines and trained models offer promising avenues for further advancements in HDR imaging, marking a significant step towards enhancing the fidelity of HDR images obtained from RAW data.