Abstract
In everyday life, the human visual system (HVS) is well-equipped to compensate for the impact of varying illumination on object colour perception, leading to the phenomenon called colour constancy (CC). CC is the ability of HVS to perceive objects' constant colour under different illumination conditions. Like other HVS capabilities, computer vision (CV) seeks to mimic HVS's power to compensate for the effects of changing illumination. Many traditional and state-of-the-art (SoTA) techniques have been developed to achieve this object in CV. These techniques can guarantee CC by eliminating the impact of an unknown light source by changing the image to appear as though it was captured under a conical light source.
While SoTA approaches seek to use deep learning (DL) models for illuminant estimation, classical methods primarily concentrate on gamut mapping and statistical solutions. Several approaches have been developed thus far for DL models for CC, most of which concentrate on sensor variant solutions (which do not produce good results if the test time distribution differs from the training distribution), and only a small number of them on sensor invariant solutions (which perform well even on an unknown camera at test time). These sensor-invariant techniques are unsuitable for real-world applications due to their drawbacks.
Considering these constraints, this thesis aims to find the most suitable DL-based approach to enhance CC in the sensor invariant case. The goal is to propose a solution based on a DL model, trained on a small dataset, that only includes images from real-world cameras and that when tested on cameras that weren't used for training, yields SoTA results. The purpose of using small datasets is to overcome the requirement for large datasets, which has frequently been reported in the literature for sensor-invariant CC. Similarly, as our goal is to train the model only with real-time data, we do not employ any data augmentation to create synthetic training data.
With the aim to improve sensor-invariant CC our approach is aimed at learning the differences in sensor distribution first and then predicting the illuminant. To achieve this objective, we train our model using contrastive learning with pairs, i.e., Siamese network approach and specific colour metric, which we call colour contrastive loss (CCL) function based on the metric learning approach. Such a strategy allows the model to be aware of the sensor variations during the training time for two reasons: \textbf{\textit{ i). the input pairs help the model to differ between the training cameras}}, and \textbf{\textit{ii). CCL allows the model to learn complex sensor distribution in higher-order polynomial space.}}
We conduct independent cross-sensor experiments using cross-validation to find the best model. Once the optimal model has been determined, we test it on another dataset and compare our findings with previously suggested solution in existing literature. The comparison demonstrates that our proposed technique achieves sensor-invariant colour constancy better than the cross-camera convolutional colour constancy method on a single image.