On Reporting Robust and Trustworthy Conclusions from Model Comparison Studies Involving Neural Networks and Randomness
Chapter
Published version
Permanent lenke
https://hdl.handle.net/11250/3084340Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
Originalversjon
10.1145/3589806.3600044Sammendrag
The performance of neural networks differ when the only difference is the seed initializing the pseudo-random number generator that generates random numbers for their training. In this paper we are concerned with how random initialization affect the conclusions that we draw from experiments with neural networks. We run a high number of repeated experiments using state of the art models for time-series prediction and image classification to investigate this statistical phenomenon. Our investigations show that erroneous conclusions can easily be drawn from such experiments. Based on these observations we propose several measures that will improve the robustness and trustworthiness of conclusions inferred from model comparison studies with small absolute effect sizes.