Vis enkel innførsel

dc.contributor.authorZhao, Xia
dc.contributor.authorJahre, Magnus
dc.contributor.authorEeckhout, Lieven
dc.date.accessioned2021-03-18T11:49:10Z
dc.date.available2021-03-18T11:49:10Z
dc.date.created2020-10-22T15:54:15Z
dc.date.issued2020
dc.identifier.isbn978-1-7281-7383-2
dc.identifier.urihttps://hdl.handle.net/11250/2734181
dc.description.abstractData-intensive applications put immense strain on the memory systems of Graphics Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests across independent units to provide high bandwidth by servicing requests (mostly) in parallel. We find that this strategy breaks down for shared data structures because the shared Last-Level Cache (LLC) organization used by contemporary GPUs stores shared data in a single LLC slice. Shared data requests are hence serialized - resulting in data-intensive applications not being provided with the bandwidth they require. A private LLC organization can provide high bandwidth, but it is often undesirable since it significantly reduces the effective LLC capacity. In this work, we propose the Selective Replication (SelRep) LLC which selectively replicates shared read-only data across LLC slices to improve bandwidth supply while ensuring that the LLC retains sufficient capacity to keep shared data cached. The compile-time component of SelRep LLC uses dataflow analysis to identify read-only shared data structures and uses a special-purpose load instruction for these accesses. The runtime component of SelRep LLC then monitors the caching behavior of these loads. Leveraging an analytical model, SelRep LLC chooses a replication degree that carefully balances the effective LLC bandwidth benefits of replication against its capacity cost. SelRep LLC consistently provides high performance to replication-sensitive applications across different data set sizes. More specifically, SelRep LLC improves performance by 19.7% and 11.1% on average (and up to 61.6% and 31.0%) compared to the shared LLC baseline and the state-of-the-art Adaptive LLC, respectively.en_US
dc.language.isoengen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.ispartofMICRO'53: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture
dc.titleSelective Replication in Memory-Side GPU Cachesen_US
dc.typeChapteren_US
dc.description.versionacceptedVersionen_US
dc.identifier.doihttps://doi.org/10.1109/MICRO50266.2020.00082
dc.identifier.cristin1841595
dc.relation.projectNorges forskningsråd: 286596en_US
dc.description.localcode© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel