Feature Analysis of Supervised Machine Learning Models in IDE-Based Learning Analytics - Exploring the use of correlation coefficients and p-values as feature utility measures through estimating student performance in an introductory programming course
Abstract
Due to the recent proliferation of large datasets collected from human behavior in digital environments, IDE-based learning analytics using supervised learning has emerged as a scientific field. However, due to its novelty, research methods tailored to the needs of IDE-based learning analytics is yet to be developed. In this paper, we present a research method for evaluating features used in supervised learning models in relation to their effect on the model s performance. We show that correlation coefficients in combination with p-values can be used as a measure of a feature s usefulness. The goal of the method is to enable researchers to understand and compare different features, allowing a higher degree of utilization of previous research, and increasing the overall research value of supervised learning in IDE-based learning analytics.