A combined informative and representative active learning approach for plankton taxa labeling
Peer reviewed, Journal article
Published version
Permanent lenke
https://hdl.handle.net/11250/3026896Utgivelsesdato
2021Metadata
Vis full innførselSamlinger
Originalversjon
Proceedings of SPIE, the International Society for Optical Engineering. 2021, 11878 . 10.1117/12.2601096Sammendrag
With an ever-increasing amount of image data, the manual labeling process has become the bottleneck in many machine learning applications. Plankton taxa labeling is especially a challenge due to its complex nature, and the manual labeling effort places a large burden on the domain experts. The Active Learning (AL) paradigm is a promising research direction adopted in the literature to minimize the manual labeling effort exerted by domain experts. Many approaches for AL have been proposed over the recent years to improve the labeling task by supporting the construction of large data sets suitable to train machine learning models while minimizing human involvement in the process. Our empirical study suggests that many modern active learning methods fail to incorporate both the samples that represent the statistical pattern of the data and the samples in which the machine learning model is not confident about. Inspired by these limitations, we propose an algorithm that combines these two types of sampling in order to capture the data distribution of the whole feature space, prevent redundant sampling from correlated uncertainty queries and fine-tune the inter-class decision boundary. Our experiments show that the proposed method outperforms each of the methods separately Further, it also proves to be efficient on both the CIFAR-10 data set and the more complex Kaggle plankton dataset.