Vis enkel innførsel

dc.contributor.authorYuxi, Liu
dc.contributor.authorZhao, Xia
dc.contributor.authorJahre, Magnus
dc.contributor.authorWang, Zhenlin
dc.contributor.authorWang, Xiaolin
dc.contributor.authorLou, Yingwei
dc.contributor.authorEeckhout, Lieven
dc.date.accessioned2019-01-30T13:58:21Z
dc.date.available2019-01-30T13:58:21Z
dc.date.created2018-11-05T20:14:27Z
dc.date.issued2018
dc.identifier.issn2575-713X
dc.identifier.urihttp://hdl.handle.net/11250/2583165
dc.description.abstractGPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem - causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31X and power-efficiency by 1.25X compared to state-of-the-art permutation-based address mapping.nb_NO
dc.language.isoengnb_NO
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)nb_NO
dc.titleGet Out of the Valley: Power-Efficient Address Mapping for GPUsnb_NO
dc.title.alternativeGet Out of the Valley: Power-Efficient Address Mapping for GPUsnb_NO
dc.typeJournal articlenb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.journalInternational Symposium on Computer Architecturenb_NO
dc.identifier.doi10.1109/ISCA.2018.00024
dc.identifier.cristin1627236
dc.description.localcode© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.nb_NO
cristin.unitcode194,63,10,0
cristin.unitnameInstitutt for datateknologi og informatikk
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel