Extracting Cyber Threat Intelligence From Hacker Forums

Deliu, Isuf

dc.contributor.advisor	Franke, Katrin
dc.contributor.advisor	Leichter, Carl
dc.contributor.advisor	Nguyen, Hai Thanh
dc.contributor.author	Deliu, Isuf
dc.date.accessioned	2017-07-18T14:01:00Z
dc.date.available	2017-07-18T14:01:00Z
dc.date.created	2017-06-01
dc.date.issued	2017
dc.identifier	ntnudaim:18019
dc.identifier.uri	http://hdl.handle.net/11250/2448949
dc.description.abstract	The use of more sophisticated tools and methods from cyber criminals has urged the cyber security community to look for enhancements to traditional security controls. Cyber Threat Intelligence represents one such proactive approach and includes the collection and analysis of information for potential threats from multiple diverse sources of data. The objective is to understand the methodology that different threat actors are using to launch their campaigns, and proactively adapt security controls to detect and prevent such activity. In addition to proprietary feeds, open sources such as social networks, news, online blogs, etc. represent valuable sources of such information. Among them, hacker forums and other platforms used as means of communication between hackers may contain vital information about security threats. The amount of data in such platforms, however, is enormous. Furthermore, their contents are not necessarily related to cyber security. Consequently, the discovery of relevant information using manual analysis is time consuming, ineffective, and requires a significant amount of resources. In this thesis, we explore the capabilities of Machine Learning methods in the task of locating relevant threat intelligence from hacker forums. We propose the combination of supervised and unsupervised learning in a two-phase process for this purpose. In the first phase, the recent developments in Deep Learning are compared against more traditional methods for text classification. The second phase involves the application of unsupervised topic models to discover the latent themes of the information deemed as relevant from the first phase. An optional third phase which includes the combination of manual analysis with other (semi)automated methods for exploring text data is applied to validate the results and get more details from the data. We tested these methods on a real hacker forum. The results of the experiments performed on manually labeled datasets show that even simple traditional methods such as Support Vector Machines with n-grams as features yield high performance on the task of classifying the contents of hacker posts. In addition, the experiments support our assumption that a considerable amount of data in such platforms is of general purpose and not relevant to cyber security. The findings from the security related data, however, include zero-day exploits, leaked credentials, IP addresses of malicious proxy servers, etc. Therefore, the hacker community should be considered an important source of threat intelligence.
dc.language	eng
dc.publisher	NTNU
dc.subject	Information Security (MIS - 2 årig), Digital forensics
dc.title	Extracting Cyber Threat Intelligence From Hacker Forums
dc.type	Master thesis

Files in this item

Name:: 18019_FULLTEXT.pdf
Size:: 1.829Mb
Format:: PDF

View/Open

Name:: 18019_COVER.pdf
Size:: 1.556Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2520]

Show simple item record