Feature Extraction and Static Analysis for Large-Scale Detection of Malware Types and Families

Grini, Lars Strande

dc.contributor.author	Grini, Lars Strande
dc.date.accessioned	2016-02-01T12:14:46Z
dc.date.available	2016-02-01T12:14:46Z
dc.date.issued	2016-02-01
dc.identifier.uri	http://hdl.handle.net/11250/2375515
dc.description.abstract	There exist different methods of identifying malware, and widespread method is the one found in almost every antivirus solution on the market today; the signature based approach. This approach uses a one-way cryptographic function to generate a unique hash of each file. Afterwards, each hash is checked against a database of hashes of known malware. This method provides close to none false positives, but this does also mean that this approach can only detect previously known malware, and will in many cases also provide a number of false negatives. Malware authors exploit this weakness in the way that they change a small part of the malicious code, and thereby changes the entire hash of the file, which then leaves the malicious code undetectable until the sample is discovered, analyzed and updated in the vendors database(s). In the light of this relatively easy mitigation for malware authors, it is clear that we need other ways to identify malware. The other two main approaches for this are static analysis and behavior based/dynamic analysis. The primary goal of such analysis and previous research has been focused around detecting whether a file is malicious or benign (binary classification). There has been comprehensive work in these fields the last few years. In the work we are proposing, we will leverage results from static analysis using machine learning methods, to distinguish malicious Windows executables. Not just benign/malicious as in many researches, but by malware family affiliation. To do this we will use a database consisting of about of 330.000 malicious executables. A challenge in this work will be the naming of the samples and families as different antivirus vendors labels samples with different names and follows no standard naming scheme. This is exemplified by e.g. the VirusTotal online scanner which scans a hash in 57 malware databases. For the static analysis we will use the VirusTotal scanner as well as an open source tool for analyzing portable executables, PEframe. The work performed in the thesis presents a novel approach to extract and construct features that can be used to make an estimation of which type and family a malicious file is an instance of, which can be useful for analysis and antivirus scanners. This contribution is novel because multinominal classification is applied to distinguish between different types and families.	nb_NO
dc.language.iso	eng	nb_NO
dc.subject	information security	nb_NO
dc.subject	malware	nb_NO
dc.subject	antivirus scanner	nb_NO
dc.title	Feature Extraction and Static Analysis for Large-Scale Detection of Malware Types and Families	nb_NO
dc.type	Master thesis	nb_NO
dc.subject.nsi	VDP::Mathematics and natural science: 400::Information and communication science: 420::Security and vulnerability: 424	nb_NO
dc.source.pagenumber	111	nb_NO

Tilhørende fil(er)

Filnavn:: LSGrini_2015.pdf
Størrelse:: 3.857Mb
Format:: PDF
Beskrivelse:: Main article

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2520]

Vis enkel innførsel