Fast and Straightforward Feature Selection Method: A Case of High-Dimensional Low Sample Size Dataset in Malware Analysis

Banin, Sergii

dc.contributor.author	Banin, Sergii
dc.date.accessioned	2021-09-09T08:00:14Z
dc.date.available	2021-09-09T08:00:14Z
dc.date.created	2021-01-19T16:29:48Z
dc.date.issued	2021
dc.identifier.isbn	978-3-030-62581-8
dc.identifier.uri	https://hdl.handle.net/11250/2774822
dc.description.abstract	Malware analysis and detection is currently one of the major topics in the information security landscape. Two main approaches to analyze and detect malware are static and dynamic analyses. In order to detect a running malware, one needs to perform dynamic analysis. Different methods of dynamic malware analysis produce different amounts of data. The methods that rely on low-level features produce very high amounts of data. Thus, machine learning methods are used to speed up and automate the analysis. The data that is fed into machine learning algorithms often requires preprocessing. Feature selection is one of the important steps of data preprocessing and often takes significant amount of time. In this paper, we analyze the Intersection Subtraction (IS) feature selection method that was first proposed and used on a high-dimensional dataset derived from the behavioral malware analysis. In our work, we assess its computational complexity and analyze potential strengths and weaknesses. In the end, we compare Intersection Subtraction and Information Gain (IG) feature selection methods in terms of potential classification performance and time complexity. We apply them to the dataset of memory access patterns produced by malicious and benign executables. As a result, we found that the features selected by IS and IG are very different. Nevertheless, machine learning models trained with IS-selected features performed almost as good as those trained with IG-selected features. IS allowed to achieve the classification accuracy of more than 99%. We also show, the IS feature selection method is faster than IG what makes it attractive to those who need to analyze high-dimensional datasets.	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	Malware Analysis Using Artificial Intelligence and Deep Learning
dc.title	Fast and Straightforward Feature Selection Method: A Case of High-Dimensional Low Sample Size Dataset in Malware Analysis	en_US
dc.type	Chapter	en_US
dc.description.version	acceptedVersion	en_US
dc.source.pagenumber	455-476	en_US
dc.identifier.doi	10.1007/978-3-030-62582-5_18
dc.identifier.cristin	1874656
dc.description.localcode	"This is a post-peer-review, pre-copyedit version of an article. Locked until 21.12.2022 due to copyright restrictions.	en_US
cristin.ispublished	true
cristin.fulltext	preprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: simple_fs.pdf
Størrelse:: 265.7Kb
Format:: PDF
Beskrivelse:: Banin

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2578]
Publikasjoner fra CRIStin - NTNU [38046]

Vis enkel innførsel