Show simple item record

dc.contributor.authorBanin, Sergii
dc.date.accessioned2021-09-09T08:00:14Z
dc.date.available2021-09-09T08:00:14Z
dc.date.created2021-01-19T16:29:48Z
dc.date.issued2021
dc.identifier.isbn978-3-030-62581-8
dc.identifier.urihttps://hdl.handle.net/11250/2774822
dc.description.abstractMalware analysis and detection is currently one of the major topics in the information security landscape. Two main approaches to analyze and detect malware are static and dynamic analyses. In order to detect a running malware, one needs to perform dynamic analysis. Different methods of dynamic malware analysis produce different amounts of data. The methods that rely on low-level features produce very high amounts of data. Thus, machine learning methods are used to speed up and automate the analysis. The data that is fed into machine learning algorithms often requires preprocessing. Feature selection is one of the important steps of data preprocessing and often takes significant amount of time. In this paper, we analyze the Intersection Subtraction (IS) feature selection method that was first proposed and used on a high-dimensional dataset derived from the behavioral malware analysis. In our work, we assess its computational complexity and analyze potential strengths and weaknesses. In the end, we compare Intersection Subtraction and Information Gain (IG) feature selection methods in terms of potential classification performance and time complexity. We apply them to the dataset of memory access patterns produced by malicious and benign executables. As a result, we found that the features selected by IS and IG are very different. Nevertheless, machine learning models trained with IS-selected features performed almost as good as those trained with IG-selected features. IS allowed to achieve the classification accuracy of more than 99%. We also show, the IS feature selection method is faster than IG what makes it attractive to those who need to analyze high-dimensional datasets.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.relation.ispartofMalware Analysis Using Artificial Intelligence and Deep Learning
dc.titleFast and Straightforward Feature Selection Method: A Case of High-Dimensional Low Sample Size Dataset in Malware Analysisen_US
dc.typeChapteren_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber455-476en_US
dc.identifier.doihttp://dx.doi.org/https://doi.org/10.1007/978-3-030-62582-5_18
dc.identifier.cristin1874656
dc.description.localcode"This is a post-peer-review, pre-copyedit version of an article. Locked until 21.12.2022 due to copyright restrictions.en_US
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record