Vis enkel innførsel

dc.contributor.authorChitrakar, Ambika Shrestha
dc.contributor.authorPetrovic, Slobodan
dc.date.accessioned2019-06-12T09:29:17Z
dc.date.available2019-06-12T09:29:17Z
dc.date.created2019-02-28T13:51:47Z
dc.date.issued2018
dc.identifier.citationIEEE International Conference on Big Data (Big Data). 2018nb_NO
dc.identifier.isbn978-1-5386-5035-6
dc.identifier.urihttp://hdl.handle.net/11250/2600575
dc.description.abstractAnalyzing digital evidence has become a big data problem, which requires faster methods to handle them on a scalable framework. Standard k-means clustering algorithm is widely used in analyzing digital evidence. However, it is a hill-climbing method and it becomes slower with the increase of data, its dimension, and the number of cluster centers. This paper presents a framework to implement parallel k-means with triangle inequality (k-meansTI) algorithm on Spark, which is supposed to improve the speed of the standard k-means algorithm by skipping many point-center distance computations, giving the same clustering results. Our experimental results show that the parallel implementation of k-meansTI on Spark can be faster than the Spark ML k-means when a data set is large, does not contain many sparse data, and is high dimensional. These results are based on the experiments performed on six different data sets that have variations on the number of features and the number of data instances.nb_NO
dc.language.isoengnb_NO
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)nb_NO
dc.relation.ispartof2018 IEEE International Conference on Big Data
dc.titleAnalyzing Digital Evidence Using Parallel k-means with Triangle Inequality on Sparknb_NO
dc.typeChapternb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber3049-3058nb_NO
dc.identifier.doi10.1109/BigData.2018.8622430
dc.identifier.cristin1681413
dc.description.localcode© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.nb_NO
cristin.unitcode194,63,30,0
cristin.unitnameInstitutt for informasjonssikkerhet og kommunikasjonsteknologi
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel