Prediction of Classifier Performance using Data Set Characteristics
MetadataShow full item record
In machine learning, choosing a suitable algorithm for solving a particular problem is often a question of trial and error. A method for predicting the performance of various algorithms on the data set could potentially simplify this process greatly. In this thesis we consider the question of whether the characteristics of a given data set can be useful for predicting the performance of classification algorithms. In order to answer this question two approaches were used. First a structured literature review was performed, with the aim of identifying existing solutions for choosing a suitable algorithm. Five existing solutions matching the criteria used were found. Secondly, a system for predicting the error rates of several classifiers on a given data set, using linear regression, was implemented. The predictions given by the system were not significantly better than those given by simply using the average error rate of each classifier. However, several of the data set characteristics considered were correlated with the performance of the various classification algorithms, and could potentially be used for predicting the performance of classification algorithm using a different approach than the linear regression approach used in this thesis.