A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique

Rao, Rajwant Singh; Dewangan, Seema; Mishra, Alok; Gupta, Manjari

dc.contributor.author	Rao, Rajwant Singh
dc.contributor.author	Dewangan, Seema
dc.contributor.author	Mishra, Alok
dc.contributor.author	Gupta, Manjari
dc.date.accessioned	2023-11-09T09:55:28Z
dc.date.available	2023-11-09T09:55:28Z
dc.date.created	2023-09-24T20:56:31Z
dc.date.issued	2023
dc.identifier.issn	2045-2322
dc.identifier.uri	https://hdl.handle.net/11250/3101596
dc.description.abstract	Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause challenges for the system's maintainability. It is quite essential to assess the severity of the code smells detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances the difficulties in code smell severity detection. In this study, four code smell severity datasets (Data class, God class, Feature envy, and Long method) are selected to detect code smell severity. In this work, an effort is made to address the issue of class imbalance, for which, the Synthetic Minority Oversampling Technique (SMOTE) class balancing technique is applied. Each dataset's relevant features are chosen using a feature selection technique based on principal component analysis. The severity of code smells is determined using five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, and Logistic Regression. This study obtained the 0.99 severity accuracy score with the Random forest and Decision tree approach with the Long method code smell. The model's performance is compared based on its accuracy and three other performance measurements (Precision, Recall, and F-measure) to estimate severity classification models. The impact of performance is also compared and presented with and without applying SMOTE. The results obtained in the study are promising and can be beneficial for paving the way for further studies in this area.	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer Nature Ltd.	en_US
dc.relation.uri	https://rdcu.be/dnajm
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique	en_US
dc.title.alternative	A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	1-18	en_US
dc.source.volume	13	en_US
dc.source.journal	Scientific Reports	en_US
dc.identifier.doi	10.1038/s41598-023-43380-8
dc.identifier.cristin	2178359
dc.source.articlenumber	16245	en_US
cristin.ispublished	false
cristin.fulltext	original
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Rao_et_al-2023-Scientific_Repo ...
Størrelse:: 1.138Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for vareproduksjon og byggteknikk [1075]
Publikasjoner fra CRIStin - NTNU [38679]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal