• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A vectorized k-means algorithm for compressed datasets: design and experimental analysis

Al Hasib, Abdullah; Cebrian, Juan Manuel; Natvig, Lasse
Journal article, Peer reviewed
Accepted version
Thumbnail
View/Open
Al Hasib (752.7Kb)
URI
http://hdl.handle.net/11250/2594414
Date
2018
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [7422]
  • Publikasjoner fra CRIStin - NTNU [41935]
Original version
Journal of Supercomputing. 2018, 74 (6), 2705-2728.   10.1007/s11227-018-2310-0
Abstract
Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other clusters. This simple concept turns out to be the basis in complex algorithms from many application areas, including sequence analysis and genotyping in bioinformatics, medical imaging, antimicrobial activity, market research, social networking, etc. However, as the data volume continues to increase, the performance of clustering algorithms is heavily influenced by the memory subsystem. In this paper, we propose a novel and efficient implementation of Lloyd’s k-means clustering algorithm to substantially reduce data movement along the memory hierarchy. Our contributions are based on the fact that the vast majority of processors are equipped with powerful Single Instruction Multiple Data (SIMD) instructions that are, in most cases, underused. SIMD improves the CPU computational power and, if used wisely, can be seen as an opportunity to improve on the application data transfers by compressing/decompressing the data, specially for memory-bound applications. Our contributions include a SIMD-friendly data layout organization, in-register implementation of key functions and SIMD-based compression. We demonstrate that using our optimized SIMD-based compression method, it is possible to improve the performance and energy of k-means by a factor of 4.5x and 8.7x, respectively, for a i7 Haswell machine, and 22x and 22.2x for Xeon Phi: KNL, running a single thread.
Publisher
Springer Verlag
Journal
Journal of Supercomputing

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit