Selective Replication in Memory-Side GPU Caches

Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven

dc.contributor.author	Zhao, Xia
dc.contributor.author	Jahre, Magnus
dc.contributor.author	Eeckhout, Lieven
dc.date.accessioned	2021-03-18T11:49:10Z
dc.date.available	2021-03-18T11:49:10Z
dc.date.created	2020-10-22T15:54:15Z
dc.date.issued	2020
dc.identifier.isbn	978-1-7281-7383-2
dc.identifier.uri	https://hdl.handle.net/11250/2734181
dc.description.abstract	Data-intensive applications put immense strain on the memory systems of Graphics Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests across independent units to provide high bandwidth by servicing requests (mostly) in parallel. We find that this strategy breaks down for shared data structures because the shared Last-Level Cache (LLC) organization used by contemporary GPUs stores shared data in a single LLC slice. Shared data requests are hence serialized - resulting in data-intensive applications not being provided with the bandwidth they require. A private LLC organization can provide high bandwidth, but it is often undesirable since it significantly reduces the effective LLC capacity. In this work, we propose the Selective Replication (SelRep) LLC which selectively replicates shared read-only data across LLC slices to improve bandwidth supply while ensuring that the LLC retains sufficient capacity to keep shared data cached. The compile-time component of SelRep LLC uses dataflow analysis to identify read-only shared data structures and uses a special-purpose load instruction for these accesses. The runtime component of SelRep LLC then monitors the caching behavior of these loads. Leveraging an analytical model, SelRep LLC chooses a replication degree that carefully balances the effective LLC bandwidth benefits of replication against its capacity cost. SelRep LLC consistently provides high performance to replication-sensitive applications across different data set sizes. More specifically, SelRep LLC improves performance by 19.7% and 11.1% on average (and up to 61.6% and 31.0%) compared to the shared LLC baseline and the state-of-the-art Adaptive LLC, respectively.	en_US
dc.language.iso	eng	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.ispartof	MICRO'53: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture
dc.title	Selective Replication in Memory-Side GPU Caches	en_US
dc.type	Chapter	en_US
dc.description.version	acceptedVersion	en_US
dc.identifier.doi	https://doi.org/10.1109/MICRO50266.2020.00082
dc.identifier.cristin	1841595
dc.relation.project	Norges forskningsråd: 286596	en_US
dc.description.localcode	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: selrep-llc-micro20-preprint.pdf
Størrelse:: 1.808Mb
Format:: PDF
Beskrivelse:: Zhao

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6552]
Publikasjoner fra CRIStin - NTNU [37228]

Vis enkel innførsel