Similarity-based Intelligent Malware Type Detection through Multiple Sources of Dynamic Characteristics

2019

Malware analysts face challenges related to increasing number of malware variants emerging every

year. Conventional classification of Windows PE32 executables into benign and malicious is no

longer sufficient and needs refinement when it comes to detecting similar functionality malware

samples belonging to the same category. Thus, it is important to explore sources of multiple dynamic

characteristics that can substantially improve similarity-based malware detection through indicators

of compromise from disk, network and memory. The goal of this thesis is to explore a way to

improve multinomial malware classification by exploiting available dynamic characteristics.

In this work dynamic features were extracted with the help of the automated malware analysis

system Cuckoo Sandbox and classified into their ten respective families with the machine learning

library Weka. It has been analysed which dynamic features contribute the most for multinomial

malware classification and what the performance gain is compared to static feature-based malware

classification. An overall classification result of 87.5% could be achieved with the best performing

dynamic features being the modified and opened registry keys, the created and modified files, the

loaded DLLs and the resolved hosts. The best performing classifier was Random Forest. This result,

however, can be improved by adding more dynamic features or combine them with selected static

features in the future.

NTNU