Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides

Liu, Weifeng; Li, Ang; Hogg, Jonathan D; Duff, Iain S; Vinter, Brian

dc.contributor.author	Liu, Weifeng
dc.contributor.author	Li, Ang
dc.contributor.author	Hogg, Jonathan D
dc.contributor.author	Duff, Iain S
dc.contributor.author	Vinter, Brian
dc.date.accessioned	2018-04-18T07:42:37Z
dc.date.available	2018-04-18T07:42:37Z
dc.date.created	2018-01-09T13:31:29Z
dc.date.issued	2017
dc.identifier.issn	1532-0626
dc.identifier.uri	http://hdl.handle.net/11250/2494579
dc.description.abstract	The sparse triangular solve kernels, SpTRSV and SpTRSM, are important building blocks for a number of numerical linear algebra routines. Parallelizing SpTRSV and SpTRSM on today's manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level‐sets or colour‐sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overheads between the sets. To address this, we propose in this paper novel approaches for SpTRSV and SpTRSM in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. To further exploit the data‐parallelism, we also develop an adaptive scheme for efficiently processing multiple right‐hand sides in SpTRSM. A comparison with a state‐of‐the‐art library supplied by the GPU vendor, using 20 sparse matrices on the latest GPU device, shows that the proposed approach obtains an average speedup of over two for SpTRSV and up to an order of magnitude speedup for SpTRSM. In addition, our method is up to two orders of magnitude faster for the preprocessing stage than existing SpTRSV and SpTRSM methods.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Wiley	nb_NO
dc.title	Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides	nb_NO
dc.type	Journal article	nb_NO
dc.description.version	submittedVersion	nb_NO
dc.source.volume	29	nb_NO
dc.source.journal	Concurrency and Computation	nb_NO
dc.source.issue	21	nb_NO
dc.identifier.doi	10.1002/cpe.4244
dc.identifier.cristin	1538810
dc.relation.project	EC/H2020/752321	nb_NO
dc.description.localcode	This article will not be available due to copyright restrictions (c) 2017 by Wiley	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	true
cristin.fulltext	preprint
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: sptrsm_liu_ccpe.pdf
Størrelse:: 1.575Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6766]
Publikasjoner fra CRIStin - NTNU [38015]

Vis enkel innførsel