Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

Liu, Junhong; He, Xin; Liu, Weifeng; Tan, Guangming

dc.contributor.author	Liu, Junhong
dc.contributor.author	He, Xin
dc.contributor.author	Liu, Weifeng
dc.contributor.author	Tan, Guangming
dc.date.accessioned	2019-03-14T08:24:52Z
dc.date.available	2019-03-14T08:24:52Z
dc.date.created	2018-09-24T10:38:36Z
dc.date.issued	2018
dc.identifier.citation	International journal of parallel programming. 2018, .	nb_NO
dc.identifier.issn	0885-7458
dc.identifier.uri	http://hdl.handle.net/11250/2589938
dc.description.abstract	General sparse matrix–matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors such as GPUs. However, their implementations of sparse accumulators, the core component of SpGEMM, mostly use low speed on-chip shared memory and global memory, and high speed registers are seriously underutilised. In this paper, we propose three novel register-aware SpGEMM algorithms for three representative sparse accumulators, i.e., sort, merge and hash, respectively. We fully utilise the GPU registers to fetch data, finish computations and store results out. In the experiments, our algorithms deliver excellent performance on a benchmark suite including 205 sparse matrices from the SuiteSparse Matrix Collection. Specifically, on an Nvidia Pascal P100 GPU, our three register-aware sparse accumulators achieve on average 2.0 × (up to 5.4 × ), 2.6 × (up to 10.5 × ) and 1.7 × (up to 5.2 × ) speedups over their original implementations in libraries bhSPARSE, RMerge and NSPARSE, respectively.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Springer Verlag	nb_NO
dc.title	Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication	nb_NO
dc.title.alternative	Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	15	nb_NO
dc.source.journal	International journal of parallel programming	nb_NO
dc.identifier.doi	10.1007/s10766-018-0604-8
dc.identifier.cristin	1612824
dc.description.localcode	This is a post-peer-review, pre-copyedit version of an article published in [International journal of parallel programming] Locked until 1.1.2020 due to copyright restrictions. The final authenticated version is available online at: https://doi.org/10.1007/s10766-018-0604-8	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	false
cristin.fulltext	preprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: spgemm_liu_ijpp.pdf
Størrelse:: 1.334Mb
Format:: PDF
Beskrivelse:: Liu

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]
Publikasjoner fra CRIStin - NTNU [37177]

Vis enkel innførsel