Efficient k-means Using Triangle Inequality on Spark for Cyber Security Analytics

Chitrakar, Ambika Shrestha; Petrovic, Slobodan

dc.contributor.author	Chitrakar, Ambika Shrestha
dc.contributor.author	Petrovic, Slobodan
dc.date.accessioned	2019-11-14T07:04:37Z
dc.date.available	2019-11-14T07:04:37Z
dc.date.created	2019-04-12T10:42:20Z
dc.date.issued	2019
dc.identifier.isbn	978-1-4503-6178-1
dc.identifier.uri	http://hdl.handle.net/11250/2628388
dc.description.abstract	With the advancement in technology and the increase in the number of digital sources, data quantity increases every day and, consequently, the cyber security related data quantity. Traditional security systems such as Intrusion Detection Systems (IDS) are not capable of handling such a growing amount of data set in real time. Cyber security analytics is an alternative solution to such traditional security systems, which can use big data analytics techniques to provide a faster and scalable framework to handle a large amount of cyber security related data in real time. k-means clustering is one of the commonly used clustering algorithms in cyber security analytics aimed at dividing security related data into groups of similar entities, which in turn can help in gaining important insights about the known and unknown attack patterns. This technique helps a security analyst to focus on the data specific to some clusters only for the analysis. To improve performance, k-means can exploit the triangle inequality to skip many point-center distance computations, without affecting the clustering results. In this paper, we re-formulate the parallel version of Elkan's k-means with triangle inequality (k-meansTI) algorithm, implement this algorithm on Apache Spark, and use it to classify Web attacks in different clusters. The paper also provides the speed comparison of our parallel k-meansTI on Spark with the Spark ML k-means clustering algorithm.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Association for Computing Machinery (ACM)	nb_NO
dc.relation.ispartof	IWSPA '19 Proceedings of the ACM International Workshop on Security and Privacy Analytics
dc.title	Efficient k-means Using Triangle Inequality on Spark for Cyber Security Analytics	nb_NO
dc.type	Chapter	nb_NO
dc.description.version	publishedVersion	nb_NO
dc.source.pagenumber	37-45	nb_NO
dc.identifier.doi	10.1145/3309182.3309187
dc.identifier.cristin	1691887
dc.relation.project	Norges forskningsråd: 248094	nb_NO
dc.description.localcode	This article will not be available due to copyright restrictions (c) 2019 by Association for Computing Machinery (ACM)	nb_NO
cristin.unitcode	194,63,30,0
cristin.unitname	Institutt for informasjonssikkerhet og kommunikasjonsteknologi
cristin.ispublished	true
cristin.fulltext	original
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: publication6.pdf
Størrelse:: 1.151Mb
Format:: PDF
Beskrivelse:: Chitrakar

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2521]
Publikasjoner fra CRIStin - NTNU [37219]

Vis enkel innførsel