Get Out of the Valley: Power-Efficient Address Mapping for GPUs

Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven

dc.contributor.author	Yuxi, Liu
dc.contributor.author	Zhao, Xia
dc.contributor.author	Jahre, Magnus
dc.contributor.author	Wang, Zhenlin
dc.contributor.author	Wang, Xiaolin
dc.contributor.author	Lou, Yingwei
dc.contributor.author	Eeckhout, Lieven
dc.date.accessioned	2019-01-30T13:58:21Z
dc.date.available	2019-01-30T13:58:21Z
dc.date.created	2018-11-05T20:14:27Z
dc.date.issued	2018
dc.identifier.issn	2575-713X
dc.identifier.uri	http://hdl.handle.net/11250/2583165
dc.description.abstract	GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem - causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31X and power-efficiency by 1.25X compared to state-of-the-art permutation-based address mapping.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	nb_NO
dc.title	Get Out of the Valley: Power-Efficient Address Mapping for GPUs	nb_NO
dc.title.alternative	Get Out of the Valley: Power-Efficient Address Mapping for GPUs	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.journal	International Symposium on Computer Architecture	nb_NO
dc.identifier.doi	10.1109/ISCA.2018.00024
dc.identifier.cristin	1627236
dc.description.localcode	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: liu-isca18-author-copy.pdf
Størrelse:: 790.6Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6552]
Publikasjoner fra CRIStin - NTNU [37237]

Vis enkel innførsel