Malware Detection Through Call Graphs
MetadataShow full item record
Each day, anti-virus companies receive large quantities of potentially harmful executables. Many of the malicious samples among these executables are variations of earlier encountered malware, created by their authors to evade pattern-based detection. Consequently, robust detection approaches are required, capable of recognizing similar samples automatically.In this thesis, malware detection through call graphs is studied. In a call graph, the functions of a binary executable are represented as vertices, and the calls between those functions as edges. By representing malware samples as call graphs, it is possible to derive and detect structural similarities between multiple samples. The latter can be used to implement generic malware detection schemes, which can proactively detect existing versions of the malware, as well as future releases with similar characteristics.To compare call graphs mutually, we compute pairwise graph similarity scores via graphmatchings which minimize an objective function known as the Graph Edit Distance. Finding exact graph matchings is intractable for large call graph instances. Hence we investigate several efficient approximation algorithms. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including variations on k-medoids clustering and DBSCAN clustering algorithms. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by virus analysts from F-Secure Corporation. Experiments show that it is indeed possible to accurately detect malware families using the DBSCAN clustering algorithm. Based on our results, we anticipate that in the future it is possible to use call graphs to analyse the emergence of new malware families, and ultimately to automate implementinggeneric protection schemes for malware families.