A Data-Driven Approach for Determining Weights in Global Similarity Functions

This paper presents a method to discover initial global similarity weights while developing a case-based reasoning (CBR) system. The approach is based on multiple feature relevance scoring methods and the relevance of features within each scoring method. The objective of this work is to utilize the characteristics of a dataset when creating similarity measures. The primary advantage of this method lies in its data-driven approach in the absence of domain knowledge in the early phase of a CBR system development. The results obtained based on the experiments on multiple public datasets show that the method improves the performance of similarity measures for a CBR system in discriminating relevant similar cases. Evaluation of the results is based on the method suitable for unbalanced datasets.

Utgiver

Springer

Tidsskrift

Lecture Notes in Computer Science (LNCS)