Feature Analysis of Supervised Machine Learning Models in IDE-Based Learning Analytics - Exploring the use of correlation coefficients and p-values as feature utility measures through estimating student performance in an introductory programming course
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2572390Utgivelsesdato
2018Metadata
Vis full innførselSamlinger
Sammendrag
Due to the recent proliferation of large datasets collected from human behavior in digital environments, IDE-based learning analytics using supervised learning has emerged as a scientific field. However, due to its novelty, research methods tailored to the needs of IDE-based learning analytics is yet to be developed. In this paper, we present a research method for evaluating features used in supervised learning models in relation to their effect on the model s performance. We show that correlation coefficients in combination with p-values can be used as a measure of a feature s usefulness. The goal of the method is to enable researchers to understand and compare different features, allowing a higher degree of utilization of previous research, and increasing the overall research value of supervised learning in IDE-based learning analytics.