Vis enkel innførsel

dc.contributor.authorKhushi, Matloob
dc.contributor.authorShaukat, Kamran
dc.contributor.authorAlam, Talha Mahboob
dc.contributor.authorHameed, Ibrahim A.
dc.contributor.authorUddin, Shahadat
dc.contributor.authorLuo, Suhuai
dc.contributor.authorYang, Xiaoyan
dc.contributor.authorReyes, Maranatha Consuelo
dc.date.accessioned2023-01-11T09:14:34Z
dc.date.available2023-01-11T09:14:34Z
dc.date.created2022-01-10T15:02:27Z
dc.date.issued2021
dc.identifier.citationIEEE Access. 2021, 9 109960-109975.en_US
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/11250/3042589
dc.description.abstractMedical datasets are usually imbalanced, where negative cases severely outnumber positive cases. Therefore, it is essential to deal with this data skew problem when training machine learning algorithms. This study uses two representative lung cancer datasets, PLCO and NLST, with imbalance ratios (the proportion of samples in the majority class to those in the minority class) of 24.7 and 25.0, respectively, to predict lung cancer incidence. This research uses the performance of 23 class imbalance methods (resampling and hybrid systems) with three classical classifiers (logistic regression, random forest, and LinearSVC) to identify the best imbalance techniques suitable for medical datasets. Resampling includes ten under-sampling methods (RUS, etc.), seven over-sampling methods (SMOTE, etc.), and two integrated sampling methods (SMOTEENN, SMOTE-Tomek). Hybrid systems include (Balanced Bagging, etc.). The results show that class imbalance learning can improve the classification ability of the model. Compared with other imbalanced techniques, under-sampling techniques have the highest standard deviation (SD), and over-sampling techniques have the lowest SD. Over-sampling is a stable method, and the AUC in the model is generally higher than in other ways. Using ROS, the random forest performs the best predictive ability and is more suitable for the lung cancer datasets used in this study. The code is available at https://mkhushi.github.io/.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleA Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Dataen_US
dc.title.alternativeA Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Dataen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber109960-109975en_US
dc.source.volume9en_US
dc.source.journalIEEE Accessen_US
dc.identifier.doi10.1109/ACCESS.2021.3102399
dc.identifier.cristin1977706
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal