Abstract
Camera trap imaging has emerged as a valuable tool for modern wildlife surveillance, enabling researchers to monitor and study wild animals and their behaviors. However, a significant challenge in camera trap data analysis is the labor-intensive task of species classification from the captured images. Utilizing deep learning has been proven to be an effective solution to reduce the workload of ecological researchers.
This thesis proposes the usage of specific metadata, including temperature, location, and time, to enhance the established field of image classification. We demonstrate the effectiveness of this approach on a dataset centered on the Norwegian climate. Our models, compared against existing ones widely used in the field, demonstrated an increase in accuracy from 98.4% to 98.9%. While this increase may seem marginal, but given the fact that the models are already approaching perfect accuracy, we argue this improvement is significant.
Furthermore, we demonstrate the potential for improving models with metadata without the need for extra work in the data collection phase. Using deep learning models for scene recognition, we achieved high prediction accuracy in an ablative study focused on classification purely from the metadata in our dataset. This automated pipeline can be used in future comprehensive networks that incorporate both image data and metadata, which could significantly enhance the image classification of wild animals.